SELECT * anti-pattern

TL;DR

SELECT * fetches every column in the row, including columns your application never uses. This adds unnecessary data transfer, deserialization, and memory overhead.
In column-oriented databases (ClickHouse, Redshift, BigQuery), SELECT * defeats columnar compression entirely. You read all columns from disk when queries typically only need 2 or 3.
SELECT * breaks index-only scans: the database cannot satisfy a query from the index alone if you've asked for all columns, forcing a heap access for every row.
Schema changes silently expose new columns (including sensitive ones like password hashes) through SELECT * queries, with no code change required.
Fix: Always specify columns explicitly. Override ORM defaults with explicit select calls.

The Problem

It's 3 a.m. and your on-call engineer gets paged. The dashboard API is timing out. Database CPU is at 94%. The slow query log shows one query repeated 10,000 times per second:

SELECT * FROM users WHERE id = ?

Your users table has 42 columns totalling ~8KB per row: id, email, name, avatar_url, bio, preferences_json, oauth_tokens_json, mfa_secret, password_hash, created_at, updated_at, last_login, stripe_customer_id, billing_address, ...

The dashboard card uses exactly 4 of those 42 columns: id, name, avatar_url, email. At 10,000 req/s, you're transferring 38 unnecessary columns per request. That's ~7.6KB of useless data per request, 76 MB/s of unnecessary network traffic, and 6.5 TB/day of database-to-app traffic providing zero value.

I've seen this exact scenario twice. Both times, the fix took 20 minutes (change the query), but the outage lasted 3 hours because nobody thought to check what the query was actually selecting.

The cascade in numbers

Here's how SELECT * costs compound at different scales. The numbers look small per-request, but they're multiplied across every query, every second, every day.

Scale	SELECT * (8KB/row)	Explicit (400 bytes/row)	Waste
100 req/s	800 KB/s	40 KB/s	760 KB/s
1K req/s	8 MB/s	400 KB/s	7.6 MB/s
10K req/s	80 MB/s	4 MB/s	76 MB/s
100K req/s	800 MB/s	40 MB/s	760 MB/s

At 100K req/s, the unnecessary data transfer alone can saturate a 10 Gbps network link. That's before accounting for deserialization cost, garbage collection pressure, or the database buffer pool churn.

Why restarts and scaling don't help

When SELECT * causes DB CPU spikes, the natural reaction is to add read replicas or increase instance size. This treats the symptom, not the cause. You're paying 20x more for database compute to transfer data nobody uses.

Doubling your database budget to handle 76 MB/s of useless data transfers is the wrong answer. The right answer is 20 minutes of query changes that reduce the load by 95%.

The index-only scan problem

If you have CREATE INDEX idx_users_email ON users (email, name) and query:

-- Index-only scan possible: both columns in the index
SELECT email, name FROM users WHERE email = ?;

-- ❌ Index-only scan NOT possible: * requires columns not in the index
SELECT * FROM users WHERE email = ?;

The * version must visit the actual table heap for each matching row (a "heap access"), which means random I/O on physical disk or cache pressure on SSD. The explicit column version can be satisfied entirely from the compact index structure: faster, lower I/O.

You can verify this with EXPLAIN ANALYZE:

-- Check if your query uses an index-only scan
EXPLAIN ANALYZE SELECT email, name FROM users WHERE email = 'alice@test.com';
-- Look for "Index Only Scan" in output

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@test.com';
-- You'll see "Index Scan" + "Heap Fetches: 1" instead

The difference matters most on large tables. A covering index scan on a 100 million row table can be 10x to 50x faster than a heap-fetching scan for the same query, because the index is orders of magnitude smaller than the table.

Column-store performance destruction

In analytical databases built on columnar storage (ClickHouse, Apache Parquet, BigQuery):

-- SELECT * reads ALL columns from disk: defeats columnar storage
SELECT * FROM events WHERE event_date = '2026-04-04';

-- Reads only 2 columns out of potentially 100: 98% less I/O
SELECT event_type, user_id FROM events WHERE event_date = '2026-04-04';

Columnar storage achieves compression ratios of 5x to 20x by storing columns contiguously and applying column-specific encoding. SELECT * reads all column files, destroying both the compression benefit and the I/O reduction.

For perspective: a ClickHouse table with 100 columns and 1 billion rows might be 50 GB on disk with columnar compression. Reading 2 columns = ~1 GB of I/O. Reading all 100 columns = ~50 GB of I/O. The difference between a 2-second query and a 2-minute query.

The schema change problem

This is the security dimension of SELECT * that many teams overlook.

// You added a `password_hash` column to users last week
// SELECT * now returns it in every query result
// Your ORM log accidentally serialises it to JSON in an audit log
// Security audit finds password hashes in your log aggregator

TL;DR

SELECT * fetches every column in the row, including columns your application never uses. This adds unnecessary data transfer, deserialization, and memory overhead.
In column-oriented databases (ClickHouse, Redshift, BigQuery), SELECT * defeats columnar compression entirely. You read all columns from disk when queries typically only need 2 or 3.
SELECT * breaks index-only scans: the database cannot satisfy a query from the index alone if you've asked for all columns, forcing a heap access for every row.
Schema changes silently expose new columns (including sensitive ones like password hashes) through SELECT * queries, with no code change required.
Fix: Always specify columns explicitly. Override ORM defaults with explicit select calls.

The Problem

It's 3 a.m. and your on-call engineer gets paged. The dashboard API is timing out. Database CPU is at 94%. The slow query log shows one query repeated 10,000 times per second:

SELECT * FROM users WHERE id = ?

I've seen this exact scenario twice. Both times, the fix took 20 minutes (change the query), but the outage lasted 3 hours because nobody thought to check what the query was actually selecting.

The cascade in numbers

Here's how SELECT * costs compound at different scales. The numbers look small per-request, but they're multiplied across every query, every second, every day.

Scale	SELECT * (8KB/row)	Explicit (400 bytes/row)	Waste
100 req/s	800 KB/s	40 KB/s	760 KB/s
1K req/s	8 MB/s	400 KB/s	7.6 MB/s
10K req/s	80 MB/s	4 MB/s	76 MB/s
100K req/s	800 MB/s	40 MB/s	760 MB/s

Why restarts and scaling don't help

Doubling your database budget to handle 76 MB/s of useless data transfers is the wrong answer. The right answer is 20 minutes of query changes that reduce the load by 95%.

The index-only scan problem

If you have CREATE INDEX idx_users_email ON users (email, name) and query:

-- Index-only scan possible: both columns in the index
SELECT email, name FROM users WHERE email = ?;

-- ❌ Index-only scan NOT possible: * requires columns not in the index
SELECT * FROM users WHERE email = ?;

You can verify this with EXPLAIN ANALYZE:

-- Check if your query uses an index-only scan
EXPLAIN ANALYZE SELECT email, name FROM users WHERE email = 'alice@test.com';
-- Look for "Index Only Scan" in output

EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@test.com';
-- You'll see "Index Scan" + "Heap Fetches: 1" instead

Column-store performance destruction

In analytical databases built on columnar storage (ClickHouse, Apache Parquet, BigQuery):

-- SELECT * reads ALL columns from disk: defeats columnar storage
SELECT * FROM events WHERE event_date = '2026-04-04';

-- Reads only 2 columns out of potentially 100: 98% less I/O
SELECT event_type, user_id FROM events WHERE event_date = '2026-04-04';

The schema change problem

This is the security dimension of SELECT * that many teams overlook.

// You added a `password_hash` column to users last week
// SELECT * now returns it in every query result
// Your ORM log accidentally serialises it to JSON in an audit log
// Security audit finds password hashes in your log aggregator

SELECT * anti-pattern

TL;DR

The Problem

The cascade in numbers

Why restarts and scaling don't help

The index-only scan problem

Column-store performance destruction

The schema change problem

Continue Reading with Premium

Comments

SELECT * anti-pattern

TL;DR

The Problem

The cascade in numbers

Why restarts and scaling don't help

The index-only scan problem

Column-store performance destruction

The schema change problem

Continue Reading with Premium

Comments