SELECT * anti-pattern
Understand why SELECT * fetches unnecessary data, breaks column-store compression, invalidates cached query plans, and how explicit column selection fixes all three problems.
TL;DR
SELECT *fetches every column in the row, including columns your application never uses. This adds unnecessary data transfer, deserialization, and memory overhead.- In column-oriented databases (ClickHouse, Redshift, BigQuery),
SELECT *defeats columnar compression entirely. You read all columns from disk when queries typically only need 2 or 3. SELECT *breaks index-only scans: the database cannot satisfy a query from the index alone if you've asked for all columns, forcing a heap access for every row.- Schema changes silently expose new columns (including sensitive ones like password hashes) through
SELECT *queries, with no code change required. - Fix: Always specify columns explicitly. Override ORM defaults with explicit
selectcalls.
The Problem
It's 3 a.m. and your on-call engineer gets paged. The dashboard API is timing out. Database CPU is at 94%. The slow query log shows one query repeated 10,000 times per second:
SELECT * FROM users WHERE id = ?
Your users table has 42 columns totalling ~8KB per row: id, email, name, avatar_url, bio, preferences_json, oauth_tokens_json, mfa_secret, password_hash, created_at, updated_at, last_login, stripe_customer_id, billing_address, ...
The dashboard card uses exactly 4 of those 42 columns: id, name, avatar_url, email. At 10,000 req/s, you're transferring 38 unnecessary columns per request. That's ~7.6KB of useless data per request, 76 MB/s of unnecessary network traffic, and 6.5 TB/day of database-to-app traffic providing zero value.
I've seen this exact scenario twice. Both times, the fix took 20 minutes (change the query), but the outage lasted 3 hours because nobody thought to check what the query was actually selecting.
The cascade in numbers
Here's how SELECT * costs compound at different scales. The numbers look small per-request, but they're multiplied across every query, every second, every day.
| Scale | SELECT * (8KB/row) | Explicit (400 bytes/row) | Waste |
|---|---|---|---|
| 100 req/s | 800 KB/s | 40 KB/s | 760 KB/s |
| 1K req/s | 8 MB/s | 400 KB/s | 7.6 MB/s |
| 10K req/s | 80 MB/s | 4 MB/s | 76 MB/s |
| 100K req/s | 800 MB/s | 40 MB/s | 760 MB/s |
At 100K req/s, the unnecessary data transfer alone can saturate a 10 Gbps network link. That's before accounting for deserialization cost, garbage collection pressure, or the database buffer pool churn.
Why restarts and scaling don't help
When SELECT * causes DB CPU spikes, the natural reaction is to add read replicas or increase instance size. This treats the symptom, not the cause. You're paying 20x more for database compute to transfer data nobody uses.
Doubling your database budget to handle 76 MB/s of useless data transfers is the wrong answer. The right answer is 20 minutes of query changes that reduce the load by 95%.
The index-only scan problem
If you have CREATE INDEX idx_users_email ON users (email, name) and query:
-- Index-only scan possible: both columns in the index
SELECT email, name FROM users WHERE email = ?;
-- ❌ Index-only scan NOT possible: * requires columns not in the index
SELECT * FROM users WHERE email = ?;
The * version must visit the actual table heap for each matching row (a "heap access"), which means random I/O on physical disk or cache pressure on SSD. The explicit column version can be satisfied entirely from the compact index structure: faster, lower I/O.
You can verify this with EXPLAIN ANALYZE:
-- Check if your query uses an index-only scan
EXPLAIN ANALYZE SELECT email, name FROM users WHERE email = 'alice@test.com';
-- Look for "Index Only Scan" in output
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'alice@test.com';
-- You'll see "Index Scan" + "Heap Fetches: 1" instead
The difference matters most on large tables. A covering index scan on a 100 million row table can be 10x to 50x faster than a heap-fetching scan for the same query, because the index is orders of magnitude smaller than the table.
Column-store performance destruction
In analytical databases built on columnar storage (ClickHouse, Apache Parquet, BigQuery):
-- SELECT * reads ALL columns from disk: defeats columnar storage
SELECT * FROM events WHERE event_date = '2026-04-04';
-- Reads only 2 columns out of potentially 100: 98% less I/O
SELECT event_type, user_id FROM events WHERE event_date = '2026-04-04';
Columnar storage achieves compression ratios of 5x to 20x by storing columns contiguously and applying column-specific encoding. SELECT * reads all column files, destroying both the compression benefit and the I/O reduction.
For perspective: a ClickHouse table with 100 columns and 1 billion rows might be 50 GB on disk with columnar compression. Reading 2 columns = ~1 GB of I/O. Reading all 100 columns = ~50 GB of I/O. The difference between a 2-second query and a 2-minute query.
The schema change problem
This is the security dimension of SELECT * that many teams overlook.
// You added a `password_hash` column to users last week
// SELECT * now returns it in every query result
// Your ORM log accidentally serialises it to JSON in an audit log
// Security audit finds password hashes in your log aggregator
Explicit column selection is also a security layer. It prevents accidentally leaking sensitive columns when schemas evolve. I've seen PII leaks in production logs traced directly to SELECT * queries that started returning a new social_security_number column after a migration.
The fix is simple: explicit column lists act as an allowlist. New columns don't appear in results until someone deliberately adds them to the query.
Why It Happens
Nobody writes SELECT * because they think it's optimal. It happens because of individually reasonable decisions that compound:
- Development speed. During prototyping, listing columns feels tedious.
SELECT *is faster to type, and the table only has 5 columns anyway. - ORM defaults. Most ORMs default to fetching entire rows. ActiveRecord's
User.find(id), TypeORM'sfindOne(), Django'sUser.objects.get()all issueSELECT *unless you explicitly opt out. This is the most common entry point for the anti-pattern.
Here's what a common ORM call generates under the hood:
// What you write
const user = await userRepo.findOne({ where: { id: 42 } });
// What the ORM generates
// SELECT "User"."id", "User"."email", "User"."name", "User"."avatar_url",
// "User"."bio", "User"."preferences_json", "User"."oauth_tokens_json",
// "User"."mfa_secret", "User"."password_hash", "User"."created_at",
// "User"."updated_at", ... (38 more columns)
// FROM "users" "User" WHERE "User"."id" = 42
The ORM doesn't literally emit SELECT *, but the effect is identical: it selects every column defined in the entity model.
- Schema drift. The table started with 8 columns. Over two years, it grew to 42. Nobody went back to update the queries because they still worked.
- Lack of visibility. Without slow query logging or network monitoring, there's no signal that the extra columns cost anything. The waste is invisible until it isn't.
- Copy-paste culture. One
SELECT *in a code example gets copied into 50 queries across 12 microservices. I've grepped through codebases and found the same pattern everywhere, all tracing back to one tutorial someone followed in year one.
The pattern is always the same: shortcuts during early development become performance landmines at scale.
The bottom line: SELECT * is a silent tax. It costs nothing at 10 rows. It costs a lot at 10 million rows. And the moment it becomes expensive, you have 200 queries to fix instead of 2.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
| High network transfer between app and DB | Over-fetching columns | Monitor bytes transferred per query in DB metrics |
| Index scans followed by heap fetches | SELECT * forcing heap access | EXPLAIN ANALYZE on frequent queries |
| ORM queries fetching full entities everywhere | Default ORM behavior | Grep for .find(), .findOne(), Model.objects.get() without select |
| Slow analytical queries on column-store | Column-store compression defeated | Check column-store query plans for full-column reads |
| Sensitive data in logs or API responses | Schema change exposed new columns | Audit log output for unexpected fields |
Quick detection script
-- PostgreSQL: find queries with high shared_blks_hit
-- relative to rows returned (suggests over-fetching)
SELECT query, calls, rows,
shared_blks_hit + shared_blks_read AS total_blocks,
(shared_blks_hit + shared_blks_read)::float / NULLIF(rows, 0) AS blocks_per_row
FROM pg_stat_statements
WHERE query LIKE 'SELECT %*%'
ORDER BY calls DESC
LIMIT 20;
ORM-specific detection
Different ORMs have different ways to detect and prevent SELECT *:
// TypeORM: enable query logging to see generated SQL
// In your data source config:
{
logging: ["query"], // Logs all SQL queries
// Then grep logs for "SELECT .* FROM" without column lists
}
// Prisma: use the query event
prisma.$on('query', (e) => {
if (e.query.includes('SELECT *') || e.query.includes('SELECT "public"')) {
console.warn('Potential SELECT * detected:', e.query);
}
});
# Django: use django-debug-toolbar or logging
# In settings.py:
LOGGING = {
'loggers': {
'django.db.backends': {
'level': 'DEBUG', # Logs all SQL
}
}
}
# Then search for queries selecting all columns
The most effective detection is a pre-commit hook or CI check that greps for SELECT * in source code. Catch it before it reaches production.
The Fix
Fix 1: Explicit column selection
The simplest fix. Name the columns you need.
-- Always explicit
SELECT id, name, email, avatar_url
FROM users
WHERE id = ?;
Trade-off: Slightly more maintenance when columns change. Worth it every time.
When adding new columns to the table later, you'll need to update queries that need the new column. This is actually a feature, not a bug: it forces you to make a conscious decision about which consumers need the new data.
Fix 2: ORM-level column selection
Most ORMs support explicit column selection. Use it.
// TypeORM: explicit select
const user = await userRepo.findOne({
where: { id },
select: ['id', 'name', 'email', 'avatarUrl']
});
// Prisma: explicit select
const user = await prisma.user.findUnique({
where: { id },
select: { id: true, name: true, email: true, avatarUrl: true }
});
# Django: explicit values
user = User.objects.values('id', 'name', 'email', 'avatar_url').get(id=user_id)
# SQLAlchemy: explicit columns
user = session.query(User.id, User.name, User.email).filter_by(id=user_id).first()
Trade-off: ORM-specific syntax. Create a shared utility or DTO pattern to avoid duplication.
Fix 3: Database views for common access patterns
For queries repeated across many services, create a view:
CREATE VIEW user_card AS
SELECT id, name, email, avatar_url
FROM users;
-- Now services query the view
SELECT * FROM user_card WHERE id = ?;
-- This SELECT * is fine: the view already limits columns
Trade-off: Another object to maintain. Best for cross-team shared access patterns.
Fix 4: Lint rules and CI enforcement
The best fix is one that prevents the problem from recurring. Add a SQL linter to your CI pipeline:
# .sqlfluff configuration
[sqlfluff:rules:L044]
# Disallow SELECT *
allow_scalar = False
// ESLint rule for ORM queries (custom rule concept)
// Flag any .find() or .findOne() without a 'select' option
// This catches the ORM default-to-* pattern
// TypeORM: require select in all findOne calls
const user = await userRepo.findOne({
where: { id },
// ❌ Without select, this is SELECT *
});
const user = await userRepo.findOne({
where: { id },
select: ['id', 'name', 'email', 'avatarUrl'] // ✅ Explicit
});
For existing codebases, run a one-time audit:
# Find all SELECT * in SQL files and source code
grep -rn "SELECT \*" --include="*.sql" --include="*.ts" --include="*.py" .
# Count how many need fixing
grep -rn "SELECT \*" --include="*.sql" --include="*.ts" --include="*.py" . | wc -l
Trade-off: Requires initial setup and may require custom rules for your ORM. But it prevents the problem from ever coming back.
Which fix to use
After the fix
Severity and Blast Radius
SELECT * is a slow killer, not a sudden one. It rarely triggers a hard outage on its own, but it amplifies every other performance problem.
- Network saturation. At scale, unnecessary bytes fill the pipe between app servers and database. This steals bandwidth from queries that actually need it.
- Memory pressure. ORMs deserialize every column into objects. 42 columns x 10K req/s = millions of unnecessary object fields in memory, increasing GC pressure.
- Security exposure. New sensitive columns (password hashes, tokens, PII) get silently exposed to application code and potentially to logs or API responses.
- Cascading effect. When the database slows down from over-fetching, every service that queries that table slows down. One
SELECT *on a hot table can degrade the entire system. - Recovery: minutes to hours. Changing
SELECT *to explicit columns is a trivially fast fix per query. The hard part is finding all the queries (could be hundreds) and validating that each replacement is correct.
| Impact Area | Low Scale (100 req/s) | High Scale (100K req/s) |
|---|---|---|
| Network waste | Negligible | Can saturate 10 Gbps link |
| DB CPU overhead | Minimal | 20-40% increase |
| Memory (app tier) | Unnoticeable | GC pauses, OOM risk |
| Security risk | Same at any scale | Same at any scale |
When It's Actually OK
SELECT * has legitimate uses. The key distinction: are you using it deliberately, or did it sneak in as a default?
- Admin export tools. You genuinely need every column for a CSV export or backup. Document it with a comment:
-- Intentional SELECT * for full export. - Schema migration scripts.
SELECT * INTO new_table FROM old_tableduring migrations is fine. These are one-time operations, not production queries. - Interactive SQL sessions. Running
SELECT *in a psql shell to explore data is normal. Just don't ship it to production code. - Tables with 3-5 columns that are unlikely to grow. A
feature_flagstable with(id, name, enabled)doesn't benefit from explicit columns. But tables grow, so revisit this assumption periodically. - Database views that already limit columns. If you've created a view with explicit columns,
SELECT * FROM user_card_viewis fine because the view is the column list.
The rule of thumb: if the query appears in production code, specify columns. If it's a one-off operation or development tool, SELECT * is fine.
Related anti-patterns
Don't confuse SELECT * with these similar but distinct issues:
| Anti-pattern | Description | Key difference |
|---|---|---|
SELECT * | Fetches all columns | Column over-fetching |
| Missing pagination | Fetches all rows | Row over-fetching |
| N+1 queries | Fetches one row at a time in a loop | Query count problem |
| Over-indexing | Too many indexes on a table | Write overhead |
They often appear together. A query with SELECT * FROM users and no LIMIT clause has both the column and row over-fetching problems simultaneously.
How This Shows Up in Interviews
When an interviewer shows a schema or asks about database performance, mentioning column selection is a strong precision signal.
What a strong answer includes:
- "I'd select only the columns the API response needs. This enables index-only scans and cuts network transfer."
- Awareness that ORMs default to full-row fetches and how to override that.
- Understanding the column-store impact for analytical workloads.
Interview phrase: "At scale, I always specify columns explicitly. It enables covering indexes, reduces serialization overhead, and prevents accidental leaks when the schema evolves."
Quick Recap
SELECT *over-fetches: every unused column adds transfer, deserialization, and memory overhead per row.- Index-only scans are impossible with
SELECT *if any column isn't in the index, forcing heap access for every row. - Column-store databases lose all compression and I/O benefits when you select all columns.
- ORMs default to full-row fetches. Override this with explicit
selectcalls. - Schema changes silently expose new columns (including sensitive ones) through
SELECT *queries. - The fix is trivial: name the columns you need. Use database views for cross-service shared access patterns.
- Detect it by grepping for
SELECT *in production code and checkingEXPLAIN ANALYZEfor unnecessary heap fetches.