Site: James Rhoat — Giving back to the community
Source: https://james.rhoat.com/p/sql-corruption-post-mortem/
Title: SQL Server Corruption Post Mortem
Description: A post-mortem guide for SQL Server database corruption: likely causes from I/O subsystem errors to firmware, early detection strategies, and recovery steps.
Date: 2021-01-10
Categories: sql-server, SQL Server database corruption recovery, DBCC CHECKDB integrity checking, I/O subsystem error diagnosis, SQL Server error 823 824 825, corruption post-mortem investigation
Word count: 325
Reading time: 2 min

Contents:
- [Most Likely Causes](#most-likely-causes)
- [Examine the Logs](#examine-the-logs)
- [What Does NOT Cause Corruption](#what-does-not-cause-corruption)
- [Early Detection Strategies](#early-detection-strategies)

***

# SQL Server Corruption Post Mortem


"No matter what method you use to recover from corruption, you should always determine why it happened to avoid future problems." -- Paul Randal

## Most Likely Causes

The most likely causes are outlined below, in order of likelihood:

1. **Run I/O subsystem and server memory diagnostics** (almost always the cause)
2. **Windows OS**
3. **File system filter drivers** -- e.g. antivirus, defraggers, encryption
4. **Network cards, switches, cables**
5. **Memory corruption**
   - Bad memory chips
   - Scribblers
6. **SAN controllers**
7. **RAID controllers**
8. **Disks**

## Examine the Logs

Check the SQL Server error log and Windows event logs for clues:

- **Error 823:** A hard I/O error (when Windows can't return the data to SQL)
- **Error 824:** A soft I/O error (when SQL detects there is a problem with the data it was given by Windows)
- **Error 825:** A read-retry error (these show as information alerts however they are critical because they are impending-doom warning signs)
- More information: [Microsoft KB 2015757](https://support.microsoft.com/en-us/kb/2015757)

Also:

- Check that the firmware is up-to-date
- Investigate NTFS filter drivers

## What Does NOT Cause Corruption

It is important to remember that corruptions are not caused by the following:

- Anything an application can do
- Anything you can do in SQL Server with supported, documented commands
- Interrupting a database shrink, index rebuild, or long-running batch
- Shutting down SQL Server

## Early Detection Strategies

If corruption happened, it is likely that certain steps are not being done inside your organization to protect your data. Here is a list of what can be done to help detect early signs:

1. **Implement error alerts for Severity 19 and above errors** -- see [Configuring Alerts for High Severity Problems](http://www.sqlskills.com/blogs/glenn/the-accidental-dba-day-17-of-30-configuring-alerts-for-high-severity-problems/)
2. **Implement Page Verify -- Checksum option** -- see [Setting Your Page Verify Database Option to Checksum](http://www.sqlskills.com/blogs/glenn/setting-your-page-verify-database-option-to-checksum/)
3. **Implement backups with Checksum option**
   - All databases INCLUDING system databases
   - Allows you to use `RESTORE VERIFYONLY` with `CHECKSUM` to validate your backups
4. **Implement an integrity check script** -- either [Ola Hallengren's Maintenance Solution](https://ola.hallengren.com/) (widely used) or at minimum a `DBCC CHECKDB WITH NO_INFOMSGS` job to check integrity


---

## Related Posts

- [Volume Shadow Copy Service (VSS)](https://james.rhoat.com/p/vss-backups/page.md) — Jan 15, 2021: How VSS and the SQL Writer Service interact with SQL Server backups, including supported operations, limitations around point-in-time recovery, and common troubleshooting steps.
- [Vertical Partitioning](https://james.rhoat.com/p/vertical-partitioning/page.md) — Jan 14, 2021: How vertical partitioning (row splitting) reduces page waste and improves query performance in SQL Server, with a one-million-row demo.
- [Understanding Storage in SQL Server](https://james.rhoat.com/p/understanding-storage/page.md) — Jan 13, 2021: How SQL Server organizes data at the page and extent level, and why LOB, in-row, and row-overflow storage matter for performance and design.
