SELECT * anti-pattern
Understand why SELECT * fetches unnecessary data, breaks column-store compression, invalidates cached query plans, and how explicit column selection fixes all three problems.
TL;DR
SELECT *fetches every column in the row, including columns your application never uses. This adds unnecessary data transfer, deserialization, and memory overhead.- In column-oriented databases (ClickHouse, Redshift, BigQuery),
SELECT *defeats columnar compression entirely. You read all columns from disk when queries typically only need 2 or 3. SELECT *breaks index-only scans: the database cannot satisfy a query from the index alone if you've asked for all columns, forcing a heap access for every row.- Schema changes silently expose new columns (including sensitive ones like password hashes) through
SELECT *queries, with no code change required. - Fix: Always specify columns explicitly. Override ORM defaults with explicit
selectcalls.
The Problem
It's 3 a.m. and your on-call engineer gets paged. The dashboard API is timing out. Database CPU is at 94%. The slow query log shows one query repeated 10,000 times per second:
SELECT * FROM users WHERE id = ?
Your users table has 42 columns totalling ~8KB per row: id, email, name, avatar_url, bio, preferences_json, oauth_tokens_json, mfa_secret, password_hash, created_at, updated_at, last_login, stripe_customer_id, billing_address, ...
The dashboard card uses exactly 4 of those 42 columns: id, name, avatar_url, email. At 10,000 req/s, you're transferring 38 unnecessary columns per request. That's ~7.6KB of useless data per request, 76 MB/s of unnecessary network traffic, and 6.5 TB/day of database-to-app traffic providing zero value.
I've seen this exact scenario twice. Both times, the fix took 20 minutes (change the query), but the outage lasted 3 hours because nobody thought to check what the query was actually selecting.
The cascade in numbers
Here's how SELECT * costs compound at different scales. The numbers look small per-request, but they're multiplied across every query, every second, every day.
| Scale | SELECT * (8KB/row) | Explicit (400 bytes/row) | Waste |
|---|---|---|---|
| 100 req/s | 800 KB/s | 40 KB/s | 760 KB/s |
| 1K req/s | 8 MB/s | 400 KB/s | 7.6 MB/s |
| 10K req/s | 80 MB/s | 4 MB/s | 76 MB/s |
| 100K req/s | 800 MB/s | 40 MB/s | 760 MB/s |
At 100K req/s, the unnecessary data transfer alone can saturate a 10 Gbps network link. That's before accounting for deserialization cost, garbage collection pressure, or the database buffer pool churn.
Why restarts and scaling don't help
When SELECT * causes DB CPU spikes, the natural reaction is to add read replicas or increase instance size. This treats the symptom, not the cause. You're paying 20x more for database compute to transfer data nobody uses.
Doubling your database budget to handle 76 MB/s of useless data transfers is the wrong answer. The right answer is 20 minutes of query changes that reduce the load by 95%.
The index-only scan problem
If you have CREATE INDEX idx_users_email ON users (email, name) and query:
-- Index-only scan possible: both columns in the index
SELECT email, name FROM users WHERE email = ?;
-- β Index-only scan NOT possible: * requires columns not in the index
SELECT * FROM users WHERE email = ?;
The * version must visit the actual table heap for each matching row (a "heap access"), which means random I/O on physical disk or cache pressure on SSD. The explicit column version can be satisfied entirely from the compact index structure: faster, lower I/O.
You can verify this with EXPLAIN ANALYZE:
-- Check if your query uses an index-only scan
EXPLAIN ANALYZE SELECT email, name FROM users WHERE email = 'alice@test.com';
-- Look for "Index Only Scan" in output
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@test.com';
-- You'll see "Index Scan" + "Heap Fetches: 1" instead
The difference matters most on large tables. A covering index scan on a 100 million row table can be 10x to 50x faster than a heap-fetching scan for the same query, because the index is orders of magnitude smaller than the table.
Column-store performance destruction
In analytical databases built on columnar storage (ClickHouse, Apache Parquet, BigQuery):
-- SELECT * reads ALL columns from disk: defeats columnar storage
SELECT * FROM events WHERE event_date = '2026-04-04';
-- Reads only 2 columns out of potentially 100: 98% less I/O
SELECT event_type, user_id FROM events WHERE event_date = '2026-04-04';
Columnar storage achieves compression ratios of 5x to 20x by storing columns contiguously and applying column-specific encoding. SELECT * reads all column files, destroying both the compression benefit and the I/O reduction.
For perspective: a ClickHouse table with 100 columns and 1 billion rows might be 50 GB on disk with columnar compression. Reading 2 columns = ~1 GB of I/O. Reading all 100 columns = ~50 GB of I/O. The difference between a 2-second query and a 2-minute query.
The schema change problem
This is the security dimension of SELECT * that many teams overlook.
// You added a `password_hash` column to users last week
// SELECT * now returns it in every query result
// Your ORM log accidentally serialises it to JSON in an audit log
// Security audit finds password hashes in your log aggregator
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.