Source: https://james.rhoat.com/p/sql-corruption-post-mortem/
Site: James Rhoat
Title: SQL Server Corruption Post Mortem
Description: A post-mortem guide for SQL Server database corruption: likely causes from I/O subsystem errors to firmware, early detection strategies, and recovery steps.
Date: 2021-01-10

***

# SQL Server Corruption Post Mortem


"No matter what method you use to recover from corruption, you should always determine why it happened to avoid future problems." -- Paul Randal

## Most Likely Causes

The most likely causes are outlined below, in order of likelihood:

1. **Run I/O subsystem and server memory diagnostics** (almost always the cause)
2. **Windows OS**
3. **File system filter drivers** -- e.g. antivirus, defraggers, encryption
4. **Network cards, switches, cables**
5. **Memory corruption**
   - Bad memory chips
   - Scribblers
6. **SAN controllers**
7. **RAID controllers**
8. **Disks**

## Examine the Logs

Check the SQL Server error log and Windows event logs for clues:

- **Error 823:** A hard I/O error (when Windows can't return the data to SQL)
- **Error 824:** A soft I/O error (when SQL detects there is a problem with the data it was given by Windows)
- **Error 825:** A read-retry error (these show as information alerts however they are critical because they are impending-doom warning signs)
- More information: [Microsoft KB 2015757](https://support.microsoft.com/en-us/kb/2015757)

Also:

- Check that the firmware is up-to-date
- Investigate NTFS filter drivers

## What Does NOT Cause Corruption

It is important to remember that corruptions are not caused by the following:

- Anything an application can do
- Anything you can do in SQL Server with supported, documented commands
- Interrupting a database shrink, index rebuild, or long-running batch
- Shutting down SQL Server

## Early Detection Strategies

If corruption happened, it is likely that certain steps are not being done inside your organization to protect your data. Here is a list of what can be done to help detect early signs:

1. **Implement error alerts for Severity 19 and above errors** -- see [Configuring Alerts for High Severity Problems](http://www.sqlskills.com/blogs/glenn/the-accidental-dba-day-17-of-30-configuring-alerts-for-high-severity-problems/)
2. **Implement Page Verify -- Checksum option** -- see [Setting Your Page Verify Database Option to Checksum](http://www.sqlskills.com/blogs/glenn/setting-your-page-verify-database-option-to-checksum/)
3. **Implement backups with Checksum option**
   - All databases INCLUDING system databases
   - Allows you to use `RESTORE VERIFYONLY` with `CHECKSUM` to validate your backups
4. **Implement an integrity check script** -- either [Ola Hallengren's Maintenance Solution](https://ola.hallengren.com/) (widely used) or at minimum a `DBCC CHECKDB WITH NO_INFOMSGS` job to check integrity

