Checksums and error detection
How CRC checksums detect data corruption in storage and networking, the math behind polynomial division, when to use which algorithm, and how databases use checksums to catch silent corruption.
The problem
A production PostgreSQL cluster serves financial records. One morning, a batch job produces account balances that are off by millions. No application bugs, no bad queries, no failed writes. The data on disk has silently changed.
The root cause: a DRAM bit flip during a write-back from the page cache to disk. The hardware reported a successful write. The OS confirmed it. The filesystem metadata is consistent. But the data itself is wrong, and nothing in the stack noticed.
Silent data corruption (often called "bit rot") is a real phenomenon in production storage. DRAM bit flips happen at roughly 1 error per 4GB per year without ECC. SSD firmware bugs can write data to the wrong cell. Disk controllers occasionally misroute writes. Network packets can arrive with flipped bits that slip past Ethernet's frame check.
Without checksums, a database silently serves corrupted data. The application trusts the result, the user trusts the application, and the corruption propagates through downstream systems. This is the problem checksums solve: they give every piece of stored or transmitted data a short fingerprint that can be verified on every read.
What it is
A checksum is a small, fixed-size value computed from a block of data. If even a single bit of the data changes, the checksum changes too. On every read, you recompute the checksum and compare it to the stored value. A mismatch means corruption.
Think of it like a parity digit on a credit card number. The last digit of any credit card is not part of the account number; it is a check digit derived from all the other digits using Luhn's algorithm. If you mistype one digit, the check digit will not match, and the system rejects the number. CRC checksums work the same way, except the "check digit" is computed using polynomial division over binary data, and the data can be megabytes instead of 16 digits.
CRC (Cyclic Redundancy Check) is the most common checksum in storage and networking. It produces a 16, 32, or 64-bit fingerprint that detects virtually all accidental corruption, costs only nanoseconds to compute, and has dedicated hardware instructions on every modern CPU.
How it works
CRC treats the input data as a long binary polynomial and divides it by a fixed generator polynomial. The remainder of that division is the CRC value, typically 32 bits.
The key insight: all arithmetic is in GF(2), meaning addition and subtraction are both XOR. There are no carries. This makes CRC computable with shift-and-XOR operations, which is why it runs so fast.
Step-by-step walkthrough
- Treat data as polynomial. A byte
0b11010011becomes $x^7 + x^6 + x^4 + x + 1$. The entire input file is one massive polynomial. - Pad with zeros. Append 32 zero bits (for CRC-32) to the data polynomial. This makes room for the remainder.
- Divide by the generator. The generator polynomial for CRC-32 is
0x04C11DB7. Division works bit-by-bit: if the current leading bit is 1, XOR with the generator and shift. If 0, just shift. - The remainder is the CRC. After processing all bits, the remaining 32 bits are the CRC value.
- Verify on read. Divide the data+CRC block by the same generator. If the remainder is zero, no corruption. If non-zero, at least one bit has changed.
# CRC-32 core algorithm (simplified, no lookup table)
def crc32(data: bytes) -> int:
generator = 0xEDB88320 # CRC-32 polynomial (reflected)
crc = 0xFFFFFFFF # Initial value
for byte in data:
crc ^= byte
for _ in range(8):
if crc & 1:
crc = (crc >> 1) ^ generator
else:
crc >>= 1
return crc ^ 0xFFFFFFFF # Final XOR
The reflected form (0xEDB88320 instead of 0x04C11DB7) processes bits LSB-first, which matches how x86 CPUs naturally read bytes. Real implementations replace the inner loop with a 256-entry lookup table, processing one byte per table lookup instead of one bit per iteration.
The lookup table optimization
The bit-by-bit algorithm processes one bit per XOR operation, which means 8 iterations per byte. In practice, every CRC implementation precomputes a 256-entry table at startup. The table maps each possible byte value to its effect on the CRC register. Then the inner loop becomes a single table lookup per byte.
# Build CRC-32 lookup table (done once at startup)
def build_crc_table() -> list[int]:
table = []
for i in range(256):
crc = i
for _ in range(8):
if crc & 1:
crc = (crc >> 1) ^ 0xEDB88320
else:
crc >>= 1
table.append(crc)
return table
CRC_TABLE = build_crc_table()
# CRC-32 with lookup table (8x faster than bit-by-bit)
def crc32_fast(data: bytes) -> int:
crc = 0xFFFFFFFF
for byte in data:
index = (crc ^ byte) & 0xFF
crc = (crc >> 8) ^ CRC_TABLE[index]
return crc ^ 0xFFFFFFFF
This reduces the inner loop from 8 operations per byte to 1 table lookup, an XOR, and a shift. Modern implementations go further with "slicing-by-8" or "slicing-by-16" techniques that process 8 or 16 bytes per loop iteration using larger tables, achieving ~1 GB/s in pure software.
Interview tip: CRC is not encryption
When an interviewer asks about data integrity, mention CRC immediately for corruption detection. But always clarify: CRC is not collision-resistant. An attacker can craft data that produces any desired CRC in microseconds. For tamper detection, you need a cryptographic hash (SHA-256, BLAKE3).
Polynomial division mechanics
The polynomial division in CRC is not regular integer division. In GF(2), there are no carries, so "subtraction" is just XOR. Here is what the division looks like for a small example: computing CRC-3 of the data 11010011 with generator 1011.
// CRC division: data = 11010011, generator = 1011 (degree 3)
// Step 1: Pad data with 3 zeros (degree of generator)
// Padded: 11010011 000
11010011 000 β data padded with 3 zeros
1011 β XOR with generator (leading bit = 1)
-----------
1100011 000
1011
----------
111011 000
1011
---------
10111 000
1011
--------
0001 000
(shift until leading 1)
...
// Final remainder (3 bits) = CRC value
Each step: if the leading bit is 1, XOR with the generator and shift. If 0, just shift. After processing all data bits, the remaining bits are the CRC.
The mathematical property that makes CRC useful: any single-bit error in the data changes the remainder. Any burst error shorter than the CRC width (32 bits for CRC-32) is guaranteed to produce a different remainder. For longer bursts, the probability of an undetected error is 1 in 2^32 (roughly 1 in 4 billion).
Why CRC catches specific error patterns
CRC's error detection comes from the algebraic properties of the generator polynomial. A well-chosen generator polynomial is irreducible over GF(2) and has specific factors that guarantee detection of common error types:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.