Missing pagination anti-pattern
Learn why unbounded SELECT queries returning millions of rows crash applications, exhaust memory, and lock tables, and how cursor-based pagination solves what OFFSET cannot.
TL;DR
SELECT * FROM orderswithout a LIMIT is unbounded. As your table grows, this query eventually returns millions of rows, exhausts application heap, and crashes the process.- OFFSET-based pagination (
LIMIT 20 OFFSET 1000) looks safe but isn't: the database must read and discard 1,000 rows to find the start of your page. At OFFSET 100,000, you've read 100,020 rows to return 20. - Cursor-based pagination (keyset pagination) uses an opaque cursor (e.g., the last row's ID or timestamp) to fetch the next page directly without scanning skipped rows. It's O(page_size) regardless of page number.
- Missing pagination is the most common cause of "works fine in development, explodes in production" database query bugs.
- Always enforce a maximum page size on API endpoints. A client requesting 10,000 rows per page is a self-inflicted DDoS.
The Problem
It's Monday morning. Your admin dashboard has an "export all orders" feature. In development, the orders table has 500 rows. The query runs in 10ms and returns a nicely formatted list. You ship it.
Six months later, production has 12 million orders. A team member clicks the export button. The query runs for 40 seconds, returns a result set of 12 million rows, and the application server runs out of memory and crashes, taking down every other user's requests with it.
I've been on the receiving end of this exact scenario. A single unbounded query from an internal tool took down a production API serving 50,000 users. The fix was adding one word to the query: LIMIT.
The same thing happens with API endpoints that don't paginate. A client calls /api/users expecting a page of users. Your system returns all 8 million users in one response. The API server OOMs. The network transfer takes 3 minutes. The client-side JavaScript freezes rendering 8 million rows.
The bottom line: an unbounded query is a production time bomb. It works fine until it doesn't, and when it fails, it takes everything else down with it.
The failure math
Consider an orders table with 10 million rows, 500 bytes per row:
- Unbounded
SELECT *: 10M rows x 500 bytes = ~5 GB result set. No application server has 5 GB of free heap for a single request. - Network transfer at 100 Mbps: 5 GB takes ~400 seconds to transmit. That's a 7-minute response time.
- Database side: the query holds a read lock (or at minimum, an MVCC snapshot) for the entire duration. Other queries queue behind it.
Even with a connection pool of 20 connections, one unbounded query can consume a connection for minutes. Five users clicking "export all" simultaneously exhausts the entire pool. I've seen a single admin tool query bring down a production database serving 200,000 active users.
Why It Happens
Four reasonable-sounding decisions lead to this anti-pattern:
-
"The dataset is small." In development and early production, the query is fast. Nobody adds pagination because there's no performance problem yet. By the time the table hits millions of rows, the unbounded query is buried deep in the codebase and nobody remembers it exists.
-
"The ORM handles it." ORMs like Django, Rails, and Sequelize make it easy to write
Model.findAll(). The generated SQL has no LIMIT. Developers trust the ORM without checking the generated query. In my experience, the most dangerous unbounded queries come from ORMs, not hand-written SQL, because the missing LIMIT is invisible. -
"We use OFFSET, so we're paginated." OFFSET-based pagination looks correct. It works at page 1. It degrades silently at page 5,000. Nobody notices until a power user scrolls deep, an automated tool iterates through all pages, or a search engine crawler tries to index every page.
-
"Internal tools don't need pagination." Admin dashboards, reporting tools, and data export features skip pagination because "only the team uses it." But "the team" includes scripts, cron jobs, and that one person who clicks "export all" on a 12-million-row table.
The common thread: none of these decisions seem wrong at the time. The anti-pattern reveals itself only when the data grows, which is exactly the point when fixing it becomes urgent.
How to Detect It
| Symptom | What It Means | How to Check |
|---|---|---|
| OOM crashes on app servers | Unbounded result set exceeds heap | Heap dump analysis + slow query log for queries without LIMIT |
| Response times > 10 seconds on list endpoints | Large result sets or high OFFSET | pg_stat_statements for queries with no LIMIT or OFFSET > 10,000 |
| Database CPU spikes when admin uses export feature | Full table scan on unbounded SELECT | EXPLAIN ANALYZE on the query |
| Client timeout on list API endpoints | Response payload too large for network transfer | Check response size in bytes (anything > 10 MB is suspicious) |
| Duplicate or missing items when paging through results | OFFSET instability from concurrent inserts | Compare item counts: COUNT(*) vs sum of items across all pages |
Code smell: Any SQL query or ORM call that returns a list without a LIMIT clause. Search your codebase for .findAll(), .find({}), or SELECT * FROM without LIMIT.
Architecture smell: API endpoints that return arrays without nextCursor or hasMore fields. If the response is just { "items": [...] } with no pagination metadata, the endpoint is unbounded.
ORM audit: Search for common unbounded patterns in your framework:
// BAD: Django
orders = Order.objects.all() // no .limit() or pagination
// BAD: Sequelize
const users = await User.findAll(); // returns every row
// BAD: Prisma
const posts = await prisma.post.findMany(); // unbounded
// GOOD: Prisma with cursor pagination
const posts = await prisma.post.findMany({
take: 20,
cursor: lastId ? { id: lastId } : undefined,
orderBy: { id: 'desc' },
});
-- Find your slow, unbounded queries in PostgreSQL
SELECT query, calls, mean_exec_time, rows
FROM pg_stat_statements
WHERE query NOT LIKE '%LIMIT%'
AND rows > 10000
ORDER BY mean_exec_time DESC
LIMIT 20;
The Fix
Why OFFSET-based pagination doesn't scale
Before jumping to the fix, let me show you why the "obvious" solution (OFFSET) is actually part of the problem. OFFSET-based pagination is the default in most ORMs and tutorials, which makes it the most common pagination approach and the most common pagination mistake.
-- Page 1: reads 20 rows
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 0;
-- Page 500: reads 10,020 rows, discards first 10,000
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Page 5000: reads 100,020 rows, discards first 100,000
SELECT * FROM orders ORDER BY created_at DESC LIMIT 20 OFFSET 100000;
OFFSET tells the database to skip N rows. It still must read those N rows before knowing which to discard. At high page numbers, this becomes a full sequential scan of the early pages on every request.
The performance degradation is linear: double the OFFSET, double the work. At OFFSET 1,000,000 on a large table, the database is doing a nearly full table scan to return 20 rows.
Additionally, OFFSET is unstable: if a new order is inserted between page requests, items shift and you either skip a row or see a duplicate. This is a correctness bug, not just a performance problem.
Performance comparison
| Approach | Page 1 (20 rows) | Page 500 (20 rows) | Page 50,000 (20 rows) |
|---|---|---|---|
| No pagination | Reads entire table | N/A | N/A |
| OFFSET | Reads 20 rows | Reads 10,020 rows | Reads 1,000,020 rows |
| Cursor-based | Reads 20 rows | Reads 20 rows | Reads 20 rows |
The difference is dramatic. At page 50,000, OFFSET reads 50,000x more rows than cursor-based pagination to return the same 20 results.
OFFSET also has a correctness problem: it's unstable under concurrent writes. If a new row is inserted while a user is paging through results, the entire result set shifts by one. The user either sees a duplicate item on the next page or skips an item entirely. Cursor-based pagination doesn't have this problem because the cursor is anchored to a specific row, not a position.
Fix 1: Cursor-based pagination (keyset pagination)
Instead of a page number, pass the last-seen value as a cursor. The next query uses it as a WHERE clause. The key insight: the client, not the server, generates the cursor from the last item's sort values. The database can use an index to jump directly to the right position instead of scanning and discarding rows.
The cursor is typically the values of the sort columns from the last row of the previous page. For example, if you sort by (created_at DESC, id DESC), the cursor is the created_at and id of the last row.
-- First page (no cursor)
SELECT id, created_at, customer_id, total
FROM orders
WHERE created_at < NOW()
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- → returns rows with created_at ranging from 2026-04-04T10:00 down to 2026-04-04T09:55
-- Next page (pass last row's created_at and id)
SELECT id, created_at, customer_id, total
FROM orders
WHERE (created_at, id) < ('2026-04-04T09:55:00', 49201)
ORDER BY created_at DESC, id DESC
LIMIT 20;
With an index on (created_at DESC, id DESC), both queries are O(20). The database seeks directly to the starting position. No rows are scanned and discarded. Page 50,000 is exactly as fast as page 1.
Important: The cursor must be a unique, deterministic sort key. If you sort by created_at alone, two rows with the same timestamp will cause items to be skipped or duplicated. Always include a unique tiebreaker (like id) in the sort order and the cursor.
-- Create the index that makes cursor pagination fast
CREATE INDEX idx_orders_cursor
ON orders (created_at DESC, id DESC);
Fix 2: Pagination at the API layer
Every API endpoint that returns a list must be paginated. This applies to REST APIs, GraphQL connections, and gRPC streaming responses. The API should enforce three constraints:
- Default limit: If the client doesn't specify a page size, use a sensible default (20-100).
- Maximum limit: Cap the page size to prevent abuse. A client requesting 10,000 items per page is a DoS vector.
- Opaque cursor: The cursor is an implementation detail. Encode it so clients can't parse or fabricate it.
// Request
GET /orders?cursor=eyJjcmVhdGVkX2F0IjoiMjAyNi0wNC0wNFQwO...&limit=20
// Response
{
"items": [ ... ],
"nextCursor": "eyJjcmVhdGVkX2F0IjoiMjAyNi0wNC0wNEw...",
"hasMore": true
}
The cursor is base64-encoded to make it opaque (clients shouldn't parse it). Always return a hasMore flag so clients know when to stop paginating.
Here's the fixed flow with cursor-based pagination:
Choosing your pagination strategy
Fix 3: Streaming for large exports
For the "export all orders" use case, don't return all rows in memory. Stream them:
app.get('/api/orders/export', async (req, res) => {
res.setHeader('Content-Type', 'text/csv');
res.setHeader('Content-Disposition', 'attachment; filename=orders.csv');
const cursor = db.query(
'SELECT id, created_at, total FROM orders ORDER BY id'
).stream();
cursor.on('data', (row) => {
res.write(`${row.id},${row.created_at},${row.total}\n`);
});
cursor.on('end', () => res.end());
});
Streaming keeps memory constant regardless of table size. The database streams rows to the application, which streams them to the client. No row is held in memory longer than it takes to write it to the response.
For very large exports, consider generating the file asynchronously and sending a download link. This avoids tying up an API server for minutes during the export:
// For millions of rows: generate CSV in background, store in S3
app.post('/api/orders/export', async (req, res) => {
const jobId = await exportQueue.add({
query: 'SELECT id, created_at, total FROM orders',
format: 'csv',
userId: req.user.id
});
res.json({ jobId, status: 'processing' });
// Client polls GET /api/exports/:jobId for download URL
});
Severity and Blast Radius
Severity: High. An unbounded query doesn't degrade gracefully. It works perfectly until the dataset crosses a threshold, then causes an OOM crash that takes down the entire process. There's no "slow degradation" warning period.
Blast radius: A single unbounded query crashes the application server, which affects every user sharing that process. In a shared database, the long-running query can also lock rows, spike CPU, and degrade performance for all other queries. With connection pooling, it ties up a connection for the duration of the transfer, reducing available connections for other requests.
Recovery difficulty: Easy once identified. Adding LIMIT is a one-line fix. Migrating from OFFSET to cursor-based pagination requires more work (schema changes, API contract updates, client-side changes), but the immediate OOM fix is trivial. The hard part is finding all the unbounded queries before they cause an incident. I recommend a periodic audit using pg_stat_statements to catch queries returning > 10,000 rows.
When It's Actually OK
OFFSET-based pagination is acceptable when:
- The dataset is small (under ~10,000 rows total, not per page)
- You need random-access pagination ("jump to page 47") since keyset pagination doesn't support this
- The data doesn't change between pages (stable snapshot, e.g. a report)
- You've verified the OFFSET won't exceed a few thousand rows
Unbounded queries (no pagination at all) are acceptable when:
- The query is guaranteed to return a small, bounded result set (e.g.,
SELECT * FROM user_preferences WHERE user_id = ?returns at most one row per preference type) - The data is used in a background job with streaming (not loaded into memory at once)
- You're reading from a configuration table with < 100 rows that doesn't grow over time
- You've added a hard LIMIT as a safety net even if you expect few results:
SELECT * FROM settings WHERE org_id = ? LIMIT 1000
The bottom line: if the result set can grow unboundedly over time, paginate it. If you're not sure whether it can grow, paginate it anyway. The cost of unnecessary pagination is near zero. The cost of missing pagination is a production outage.
How This Shows Up in Interviews
When designing an API that returns a list, describe pagination proactively. A strong answer sounds like: "The endpoint returns up to 100 items per page with cursor-based pagination. The cursor encodes the last-seen ID and timestamp so the next query can seek directly without an offset scan."
Interviewers look for three signals:
- You default to paginated endpoints (not "return all items")
- You explain why cursor-based pagination beats OFFSET at scale
- You mention a maximum page size to prevent abuse
A bonus signal: mentioning that cursor-based pagination requires a composite index and a unique tiebreaker shows you understand the database-level implementation, not just the API pattern.
Every list endpoint is a potential OOM
If you're designing an endpoint that returns a list and you don't mention pagination, the interviewer notices. It signals you haven't thought about what happens when the dataset grows from 500 rows to 5 million.
Quick Recap
- Unbounded SELECT is a time bomb: it works in development and OOMs in production when the table grows.
- OFFSET-based pagination is safe at small offsets but degrades as O(offset + limit). Page 5,000 reads 100,020 rows.
- Cursor-based pagination uses a WHERE clause on the boundary value, making every page O(page_size) with a proper index.
- Every API list endpoint needs pagination: default limit, maximum limit, and opaque cursor.
- OFFSET is acceptable for reports on stable, bounded datasets under ~10,000 rows. For live production APIs, use cursors.
- For large exports, stream rows instead of loading them into memory. Constant memory regardless of table size.
- Always enforce a maximum page size on API endpoints to prevent abuse.