Magic NULL anti-pattern
Learn why overloading NULL with multiple meanings creates three-valued logic bugs, silent query failures, and unmaintainable code, and how to use sentinel values and separate columns instead.
TL;DR
- SQL NULL means "unknown." It does not mean "zero," "false," "not applicable," "deleted," or "empty." Using NULL for these semantics creates three-valued logic: TRUE, FALSE, and NULL.
- Three-valued logic causes silent query bugs:
WHERE deleted_at IS NULLworks, butWHERE deleted_at != '2026-01-01'silently excludes all NULL rows. - The anti-pattern: using a single nullable column for multiple meanings, like
deleted_at IS NULL= active,deleted_at = timestamp= deleted,deleted_at = sentinel= deactivated. - Fix: Use separate boolean/enum columns (
is_deleted,status). Reserve NULL for genuinely unknown values only.
The Problem
It's the first Monday of Q2, and your VP of Sales is staring at a dashboard showing 12% lower active user counts than last month. She escalates to engineering. The data team spends two days investigating before a junior analyst finds it: a month ago, someone added a third meaning to the deleted_at column.
Your team uses a nullable deleted_at timestamp for soft deletion. NULL means active. A timestamp means deleted. This is common and mostly fine.
Then someone adds: if deleted_at is a specific sentinel value (e.g., '1970-01-01'), it means "deactivated by admin but not deleted." Now NULL has one meaning, a real timestamp has another meaning, and the sentinel timestamp has a third meaning. Three values of one column, three interpretations.
The bug: an analytics query filters WHERE deleted_at IS NULL OR deleted_at > '2020-01-01'. This intended to find "active or recently deleted" records. But it silently excludes all deleted_at = '1970-01-01' rows (the deactivated accounts). The analytics report is quietly wrong for weeks before anyone notices.
I've debugged this exact class of bug three times. Every time, the root cause was the same: a column carrying more semantic weight than its type can express.
The worst part? The analytics team didn't question the numbers for weeks. The 12% drop looked like normal churn. Nobody realized the data was wrong until a sales manager manually counted accounts and got a different number.
The timeline of a magic NULL bug
- Month 0:
deleted_atcolumn added. NULL = active, timestamp = deleted. Clean, simple. - Month 6: Admin deactivation feature needed. Developer reuses
deleted_atwith sentinel'1970-01-01'. No migration, no discussion. "Quick fix." - Month 7: Analytics query
WHERE deleted_at IS NULLstill works for "active users." Nobody notices the deactivated accounts are being ignored in a different report. - Month 9: VP notices the numbers are off. Two-day investigation begins.
- Month 9 + 2 days: Root cause found: three meanings in one column. Now you need to backfill a new
statuscolumn across 50 million rows.
SQL's three-valued logic
SQL defines three truth values: TRUE, FALSE, and NULL (unknown). The rules feel intuitive until they don't:
SELECT 1 WHERE NULL = NULL; -- returns 0 rows! NULL is not equal to NULL.
SELECT 1 WHERE NULL IS NULL; -- returns 1 row. Correct test for NULL.
SELECT 1 WHERE NULL != 1; -- returns 0 rows! Not because it's false,
-- but because NULL != 1 evaluates to NULL.
Any comparison involving NULL evaluates to NULL, not TRUE or FALSE. WHERE clauses return only rows where the condition is TRUE, so NULL rows are excluded silently.
This means:
-- This is NOT the complement of WHERE status = 'active'
WHERE status != 'active'
-- It excludes rows where status IS NULL entirely, silently.
The practical consequence is brutal: you cannot use simple equality or inequality to query nullable columns. Every query must explicitly handle the NULL case, and forgetting to do so doesn't raise an error. It just silently returns fewer rows.
-- The CORRECT complement requires explicit NULL handling
WHERE status != 'active' OR status IS NULL
Aggregation traps
NULL also affects aggregate functions in non-obvious ways:
-- COUNT(*) counts all rows, but COUNT(column) skips NULLs
SELECT COUNT(*) FROM users; -- 10,000
SELECT COUNT(deleted_at) FROM users; -- 3,200 (only non-NULL values)
SELECT COUNT(*) - COUNT(deleted_at); -- 6,800 (the NULL rows)
-- AVG ignores NULLs entirely
SELECT AVG(score) FROM reviews;
-- If 40% of rows have NULL score, the average is computed
-- over the remaining 60% only. This may or may not be
-- what you want, but it's never what you expect.
Common NULL overloading mistakes
| Pattern | Problem |
|---|---|
price IS NULL means "price not set yet" AND "price is free" | Ambiguous: can't tell if it should be displayed or if data is missing |
deleted_at IS NULL means "active" | Fine alone, but breaks when combined with other NULLable timestamp columns |
phone IS NULL means "no phone" AND "phone not collected yet" | Two semantically different states look identical |
score IS NULL means "not scored" AND "scored 0" | Zero should be 0, not NULL |
response_time IS NULL means "request never made" AND "request still pending" | Can't differentiate in-flight from never-attempted |
Why It Happens
Nobody starts with the intent to create magic NULLs. The pattern sneaks in through reasonable decisions:
- Soft deletion convention.
deleted_at IS NULLis a well-known Rails/Django convention. It works fine for one state transition, but teams extend it to track additional states without changing the approach. - Schema change avoidance. Adding a new column requires a migration, possibly with backfills. Overloading an existing nullable column is "just a logic change" with no migration needed.
- NULL is the path of least resistance. When designing a new column, making it nullable is the default in most database tools. NOT NULL requires thinking about defaults, backfills, and what "no value" means for this domain.
Here's a real example of how it escalates:
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.