Magic NULL anti-pattern

TL;DR

SQL NULL means "unknown." It does not mean "zero," "false," "not applicable," "deleted," or "empty." Using NULL for these semantics creates three-valued logic: TRUE, FALSE, and NULL.
Three-valued logic causes silent query bugs: WHERE deleted_at IS NULL works, but WHERE deleted_at != '2026-01-01' silently excludes all NULL rows.
The anti-pattern: using a single nullable column for multiple meanings, like deleted_at IS NULL = active, deleted_at = timestamp = deleted, deleted_at = sentinel = deactivated.
Fix: Use separate boolean/enum columns (is_deleted, status). Reserve NULL for genuinely unknown values only.

It's the first Monday of Q2, and your VP of Sales is staring at a dashboard showing 12% lower active user counts than last month. She escalates to engineering. The data team spends two days investigating before a junior analyst finds it: a month ago, someone added a third meaning to the deleted_at column.

Your team uses a nullable deleted_at timestamp for soft deletion. NULL means active. A timestamp means deleted. This is common and mostly fine.

Then someone adds: if deleted_at is a specific sentinel value (e.g., '1970-01-01'), it means "deactivated by admin but not deleted." Now NULL has one meaning, a real timestamp has another meaning, and the sentinel timestamp has a third meaning. Three values of one column, three interpretations.

The bug: an analytics query filters WHERE deleted_at IS NULL OR deleted_at > '2020-01-01'. This intended to find "active or recently deleted" records. But it silently excludes all deleted_at = '1970-01-01' rows (the deactivated accounts). The analytics report is quietly wrong for weeks before anyone notices.

I've debugged this exact class of bug three times. Every time, the root cause was the same: a column carrying more semantic weight than its type can express.

The worst part? The analytics team didn't question the numbers for weeks. The 12% drop looked like normal churn. Nobody realized the data was wrong until a sales manager manually counted accounts and got a different number.

The timeline of a magic NULL bug

Month 0: deleted_at column added. NULL = active, timestamp = deleted. Clean, simple.
Month 6: Admin deactivation feature needed. Developer reuses deleted_at with sentinel '1970-01-01'. No migration, no discussion. "Quick fix."
Month 7: Analytics query WHERE deleted_at IS NULL still works for "active users." Nobody notices the deactivated accounts are being ignored in a different report.
Month 9: VP notices the numbers are off. Two-day investigation begins.
Month 9 + 2 days: Root cause found: three meanings in one column. Now you need to backfill a new status column across 50 million rows.

SQL's three-valued logic

SQL defines three truth values: TRUE, FALSE, and NULL (unknown). The rules feel intuitive until they don't:

SELECT 1 WHERE NULL = NULL;  -- returns 0 rows! NULL is not equal to NULL.
SELECT 1 WHERE NULL IS NULL; -- returns 1 row. Correct test for NULL.
SELECT 1 WHERE NULL != 1;    -- returns 0 rows! Not because it's false,
                              -- but because NULL != 1 evaluates to NULL.

Any comparison involving NULL evaluates to NULL, not TRUE or FALSE. WHERE clauses return only rows where the condition is TRUE, so NULL rows are excluded silently.

This means:

-- This is NOT the complement of WHERE status = 'active'
WHERE status != 'active'
-- It excludes rows where status IS NULL entirely, silently.

The practical consequence is brutal: you cannot use simple equality or inequality to query nullable columns. Every query must explicitly handle the NULL case, and forgetting to do so doesn't raise an error. It just silently returns fewer rows.

-- The CORRECT complement requires explicit NULL handling
WHERE status != 'active' OR status IS NULL

Aggregation traps

NULL also affects aggregate functions in non-obvious ways:

-- COUNT(*) counts all rows, but COUNT(column) skips NULLs
SELECT COUNT(*) FROM users;            -- 10,000
SELECT COUNT(deleted_at) FROM users;   -- 3,200 (only non-NULL values)
SELECT COUNT(*) - COUNT(deleted_at);   -- 6,800 (the NULL rows)

-- AVG ignores NULLs entirely
SELECT AVG(score) FROM reviews;
-- If 40% of rows have NULL score, the average is computed
-- over the remaining 60% only. This may or may not be
-- what you want, but it's never what you expect.

Common NULL overloading mistakes

Pattern	Problem
`price IS NULL` means "price not set yet" AND "price is free"	Ambiguous: can't tell if it should be displayed or if data is missing
`deleted_at IS NULL` means "active"	Fine alone, but breaks when combined with other NULLable timestamp columns
`phone IS NULL` means "no phone" AND "phone not collected yet"	Two semantically different states look identical
`score IS NULL` means "not scored" AND "scored 0"	Zero should be 0, not NULL
`response_time IS NULL` means "request never made" AND "request still pending"	Can't differentiate in-flight from never-attempted

Why It Happens

Nobody starts with the intent to create magic NULLs. The pattern sneaks in through reasonable decisions:

Soft deletion convention. deleted_at IS NULL is a well-known Rails/Django convention. It works fine for one state transition, but teams extend it to track additional states without changing the approach.
Schema change avoidance. Adding a new column requires a migration, possibly with backfills. Overloading an existing nullable column is "just a logic change" with no migration needed.
NULL is the path of least resistance. When designing a new column, making it nullable is the default in most database tools. NOT NULL requires thinking about defaults, backfills, and what "no value" means for this domain.

Here's a real example of how it escalates:

TL;DR

SQL NULL means "unknown." It does not mean "zero," "false," "not applicable," "deleted," or "empty." Using NULL for these semantics creates three-valued logic: TRUE, FALSE, and NULL.
Three-valued logic causes silent query bugs: WHERE deleted_at IS NULL works, but WHERE deleted_at != '2026-01-01' silently excludes all NULL rows.
The anti-pattern: using a single nullable column for multiple meanings, like deleted_at IS NULL = active, deleted_at = timestamp = deleted, deleted_at = sentinel = deactivated.
Fix: Use separate boolean/enum columns (is_deleted, status). Reserve NULL for genuinely unknown values only.

The Problem

Your team uses a nullable deleted_at timestamp for soft deletion. NULL means active. A timestamp means deleted. This is common and mostly fine.

I've debugged this exact class of bug three times. Every time, the root cause was the same: a column carrying more semantic weight than its type can express.

The timeline of a magic NULL bug

Month 0: deleted_at column added. NULL = active, timestamp = deleted. Clean, simple.
Month 6: Admin deactivation feature needed. Developer reuses deleted_at with sentinel '1970-01-01'. No migration, no discussion. "Quick fix."
Month 7: Analytics query WHERE deleted_at IS NULL still works for "active users." Nobody notices the deactivated accounts are being ignored in a different report.
Month 9: VP notices the numbers are off. Two-day investigation begins.
Month 9 + 2 days: Root cause found: three meanings in one column. Now you need to backfill a new status column across 50 million rows.

SQL's three-valued logic

SQL defines three truth values: TRUE, FALSE, and NULL (unknown). The rules feel intuitive until they don't:

SELECT 1 WHERE NULL = NULL;  -- returns 0 rows! NULL is not equal to NULL.
SELECT 1 WHERE NULL IS NULL; -- returns 1 row. Correct test for NULL.
SELECT 1 WHERE NULL != 1;    -- returns 0 rows! Not because it's false,
                              -- but because NULL != 1 evaluates to NULL.

Any comparison involving NULL evaluates to NULL, not TRUE or FALSE. WHERE clauses return only rows where the condition is TRUE, so NULL rows are excluded silently.

This means:

-- This is NOT the complement of WHERE status = 'active'
WHERE status != 'active'
-- It excludes rows where status IS NULL entirely, silently.

-- The CORRECT complement requires explicit NULL handling
WHERE status != 'active' OR status IS NULL

Aggregation traps

NULL also affects aggregate functions in non-obvious ways:

-- COUNT(*) counts all rows, but COUNT(column) skips NULLs
SELECT COUNT(*) FROM users;            -- 10,000
SELECT COUNT(deleted_at) FROM users;   -- 3,200 (only non-NULL values)
SELECT COUNT(*) - COUNT(deleted_at);   -- 6,800 (the NULL rows)

-- AVG ignores NULLs entirely
SELECT AVG(score) FROM reviews;
-- If 40% of rows have NULL score, the average is computed
-- over the remaining 60% only. This may or may not be
-- what you want, but it's never what you expect.

Common NULL overloading mistakes

Pattern	Problem
`price IS NULL` means "price not set yet" AND "price is free"	Ambiguous: can't tell if it should be displayed or if data is missing
`deleted_at IS NULL` means "active"	Fine alone, but breaks when combined with other NULLable timestamp columns
`phone IS NULL` means "no phone" AND "phone not collected yet"	Two semantically different states look identical
`score IS NULL` means "not scored" AND "scored 0"	Zero should be 0, not NULL
`response_time IS NULL` means "request never made" AND "request still pending"	Can't differentiate in-flight from never-attempted

Why It Happens

Nobody starts with the intent to create magic NULLs. The pattern sneaks in through reasonable decisions:

Soft deletion convention. deleted_at IS NULL is a well-known Rails/Django convention. It works fine for one state transition, but teams extend it to track additional states without changing the approach.
Schema change avoidance. Adding a new column requires a migration, possibly with backfills. Overloading an existing nullable column is "just a logic change" with no migration needed.
NULL is the path of least resistance. When designing a new column, making it nullable is the default in most database tools. NOT NULL requires thinking about defaults, backfills, and what "no value" means for this domain.

Here's a real example of how it escalates:

Magic NULL anti-pattern

TL;DR

The Problem

The timeline of a magic NULL bug

SQL's three-valued logic

Aggregation traps

Common NULL overloading mistakes

Why It Happens

Continue Reading with Premium

Comments

Magic NULL anti-pattern

TL;DR

The Problem

The timeline of a magic NULL bug

SQL's three-valued logic

Aggregation traps

Common NULL overloading mistakes

Why It Happens

Continue Reading with Premium

Comments