Files in database anti-pattern
Learn why storing binary files as BLOBs in a relational database bloats row sizes, explodes backup times, and creates replication lag, and when object storage is always the right answer.
TL;DR
- Storing binary files (images, PDFs, videos, audio) as BLOBs in a relational database uses a storage engine designed for structured, indexed, queryable data to store bytes it cannot query, index, or compress.
- The costs: row bloat slows all queries (not just file queries), backups take forever, replication lag grows, sequential table scans read giant BLOBs, VACUUM is slow, and you cannot CDN-cache database content.
- The right answer in nearly every case: store files in an object store (S3, GCS, Azure Blob Storage). Store the file path or key in the database. Serve files via CDN.
- The only exception: very small, frequently updated binary data (thumbnails under 100KB in a system that never serves them to end users) where the overhead of S3 round-trips dominates.
The Problem
A developer building a document management system stores uploaded PDFs directly in the database. The schema looks reasonable:
CREATE TABLE documents (
id UUID PRIMARY KEY,
filename VARCHAR(255),
content BYTEA, -- stores the PDF binary
uploaded_at TIMESTAMP,
user_id UUID
);
Upload 50,000 documents averaging 2MB each. Your database is now 100GB larger just from content columns. Your nightly backup takes 6 hours instead of 20 minutes. A sequential scan of the documents table for any reason reads 100GB of binary data. VACUUM must process every BYTEA column on every dead tuple. Replication log includes all 100GB, so your replica has 30 minutes of lag.
Your CDN cannot cache the PDFs because the browser downloads them through an API endpoint that hits your database. Every document download consumes a database connection for the duration of the transfer. At 2MB per file and 100ms network time, that is one connection tied up per download.
I inherited a system exactly like this in 2021. The team had 50,000 user-uploaded documents in PostgreSQL. A simple SELECT filename FROM documents WHERE user_id = ? took 800ms because the planner had to skip over BYTEA toast pages. Moving files to S3 dropped that query to 3ms.
Why It Happens
Teams store files in the database because it feels simpler. One system, one backup, one transaction. The individually-reasonable logic:
- "One system is easier to operate." True for development. Catastrophic at scale when that one system does two fundamentally different jobs.
- "We need transactions around the file." You need the metadata to be transactional. The file itself is immutable after upload.
- "S3 is another dependency." So is your database. And S3 has 99.999999999% durability, better than any self-managed database.
- "We only have a few files." Every system that stores files in the database started with "just a few files."
The mismatch is fundamental. Databases are optimized for narrow, indexed, queryable rows. Files are the opposite:
| Characteristic | Files (BLOBs) | Relational Data |
|---|---|---|
| Access pattern | Full read or write, no partial | Column-level, row-level, range queries |
| Indexability | Not indexable | Fully indexable |
| Compression | Already compressed (JPEG, PDF) | Benefits from row compression |
| Query predicates | Never queried as a WHERE clause | The whole point |
| Row width | Makes rows huge | Rows should be narrow and uniform |
| CDN compatibility | No (database is not HTTP) | Not applicable |
| Concurrency | One connection per download | Row-level locking, MVCC |
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
BYTEA or BLOB columns in schema | Binary data stored in relational tables | SELECT table_name, column_name FROM information_schema.columns WHERE data_type IN ('bytea', 'blob', 'longblob', 'oid') |
| Database size >> sum of row data | BLOBs inflating total size | SELECT pg_size_pretty(pg_total_relation_size('documents')) vs expected row count * avg metadata size |
| Backup duration growing linearly with uploads | pg_dump/mysqldump serializing BLOBs | Monitor backup duration trend over time |
| Replication lag correlated with upload activity | WAL includes full binary content | Check pg_stat_replication.replay_lag during upload spikes |
| API download endpoint hitting database connection pool | Every file download consumes a DB connection | Monitor connection pool usage during download traffic |
| VACUUM taking hours on a table with few rows | Processing TOAST pages for large BYTEA | SELECT relname, n_dead_tup, last_autovacuum FROM pg_stat_user_tables WHERE relname = 'documents' |
The Fix
Fix 1: Object store + metadata-only database
Upload flow:
Client → uploadFile(pdf) → S3.putObject(key="docs/user-123/filename.pdf") → DB.insert(file_metadata)
Database stores:
{ id, user_id, filename, s3_key: "docs/user-123/filename.pdf", size_bytes, mime_type, created_at }
Download flow (option 1, pre-signed URL):
Client → GET /files/{id}/url
Server → S3.generatePresignedUrl(key, expiresIn=3600) → return { url }
Client → GET https://bucket.s3.amazonaws.com/docs/user-123/filename.pdf (direct to S3)
Download flow (option 2, CDN):
Client → GET https://cdn.example.com/docs/user-123/filename.pdf
CDN → S3 origin on cache miss → cached at edge for subsequent requests
The database now stores a 100-byte s3_key string instead of a 2MB BYTEA. Backups are tiny. VACUUM is fast. Files are served at CDN speed without touching the database.
Fix 2: Pre-signed URL pattern (detailed)
Pre-signed URLs are the key mechanism. The server generates a time-limited URL that grants direct access to S3 without proxying through the application:
import { S3Client, GetObjectCommand } from "@aws-sdk/client-s3";
import { getSignedUrl } from "@aws-sdk/s3-request-presigner";
async function getDownloadUrl(fileId: string): Promise<string> {
const file = await db.query(
"SELECT s3_key FROM documents WHERE id = $1", [fileId]
);
const command = new GetObjectCommand({
Bucket: "my-documents-bucket",
Key: file.s3_key,
});
// URL expires in 1 hour, client downloads directly from S3
return getSignedUrl(s3Client, command, { expiresIn: 3600 });
}
Trade-off: Pre-signed URLs can be shared within their TTL. If your files are sensitive, use short TTLs (5 minutes) and enforce authorization checks before generating the URL.
The cost difference at scale
At 100 document downloads per second with 2MB files:
- Database-served: 200 MB/s flowing out of the database connections. Your 200-connection pool handles 1,000 concurrent downloads before exhaustion.
- S3-served via CDN: 0 bytes from the database. All from CDN edge nodes globally. S3 handles tens of millions of concurrent requests.
Severity and Blast Radius
This anti-pattern affects everything in the database, not just file-related queries.
- Query performance: Even queries that never touch the file column are slowed because sequential scans, VACUUM, and index operations must navigate around TOAST pages storing BLOBs.
- Backup and recovery: pg_dump serializes BLOBs inline. A 100GB database of mostly files takes 6+ hours to backup. RTO (recovery time objective) becomes measured in hours.
- Replication: WAL includes full binary content. Large file uploads cause replication lag spikes, which can trigger stale reads from replicas.
- Connection pool exhaustion: Each file download holds a database connection for the duration of the transfer. 100 concurrent downloads = 100 connections consumed.
Recovery difficulty: medium. The migration path is straightforward (copy files to S3, update rows with s3_key, drop BYTEA column), but it requires downtime or a careful dual-write migration for large datasets.
When It's Actually OK
- Thumbnails under 100KB in an internal admin tool where CDN caching is unnecessary and the total number of files is under 10,000. The operational simplicity of one system may outweigh the performance cost.
- Cryptographic keys or certificates (a few KB each) that must be stored atomically with their metadata in the same transaction and never served to end users.
- SQLite in embedded/mobile apps where there is no network and no object store. SQLite BLOB storage is fine for local-only applications.
- Temporary processing artifacts that are written once, read once within seconds, and deleted. If the file never persists long enough to affect backups or replication, in-database storage is acceptable.
How This Shows Up in Interviews
When a design involves file uploads (profile photos, documents, videos), describe the S3 pattern immediately: "Files go to S3. The database stores only the S3 key, size, and metadata. Downloads use pre-signed URLs with a CDN in front of S3 for frequently accessed content. The database never serves binary data."
The three-tier storage rule
Hot files: CDN edge. Warm files: S3 standard tier. Cold files: S3 Glacier. The database is not a tier in this hierarchy. Mentioning this framework in an interview shows you think about cost optimization and access patterns.
Quick Recap
- BLOBs in a database bloat rows, slow VACUUM and backups, inflate replication logs, and prevent CDN caching.
- Object stores (S3, GCS) are designed for files: durable, cheap at scale, and served directly to clients.
- The database stores only the file key/path, a single VARCHAR column per file regardless of file size.
- Pre-signed URLs grant time-limited direct S3 access without proxying through your application.
- Put a CDN in front of S3 for any files that serve end users. This eliminates database connection pressure and provides global edge caching.
- The three-tier rule: hot files on CDN, warm files on S3, cold files on Glacier. The database is never a storage tier.