Cursor-based pagination
Understand why offset pagination breaks under concurrent writes, and how cursor-based (keyset) pagination delivers stable, consistent pages at any scale.
The problem with offset pagination
Offset pagination is the natural first instinct. Give the client a page number or an offset count. On each request, skip that many rows and return the next batch.
GET /tweets?user_id=42&offset=20&limit=20
-- "Give me tweets 21-40"
SELECT * FROM tweets WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20 OFFSET 20;
This works correctly on a static dataset. It breaks the moment anything is inserted or deleted while a user is paging through results.
Consider a user's timeline with 100 tweets ordered newest-first. The client has loaded the first page (tweets 1-20). While they read, three new tweets arrive at the top. The second page request with OFFSET 20 now returns tweets 21-40 from the original numbering. But because three new tweets shifted everything down by 3, the second page actually returns tweets 18-37 from the user's perspective, repeating tweets 18-20 from the first page and skipping nothing.
The same problem happens on deletes: items are silently skipped.
I see this described as a niche problem in interviews. It is not. Any social feed, message list, order history, or search result where items arrive while the user browses suffers from this. It is the correct default for almost every paginated API.
What cursor pagination is
Cursor-based pagination (also called keyset pagination) anchors each page request to a specific row identity rather than a positional offset. The server returns an opaque token called a cursor encoding the last item the client received; subsequent requests attach this token and the server translates it into a WHERE predicate to fetch the next batch.
Think of it like a bookmark in a book. You do not say "give me page 87." You put a bookmark after the last sentence you read and say "start from here." New chapters added before your bookmark do not move it; it stays anchored to the content you last read.
The cursor is the only mechanism that guarantees stable, consistent pages in a dataset that changes between requests. Every major social feed, messaging API, and activity stream uses this approach.
How cursor-based pagination works
sequenceDiagram
participant C as Client
participant S as Server
participant DB as Database
C->>S: GET /tweets?limit=20
Note over C,S: First request: no cursor
S->>DB: SELECT * ORDER BY id DESC LIMIT 20
DB-->>S: 20 rows ยท last id = 981
S-->>C: tweets=[...], next_cursor="981", has_more=true
C->>S: GET /tweets?cursor=981&limit=20
Note over C,S: Cursor present: keyset anchor
S->>DB: SELECT * WHERE id < 981 ORDER BY id DESC LIMIT 20
DB-->>S: 20 rows ยท last id = 961
S-->>C: tweets=[...], next_cursor="961", has_more=true
Instead of "skip N rows", the client tells the server "give me everything after this specific record". The cursor is an opaque token encoding the last item the client received. The server uses it as a WHERE clause filter, not an OFFSET.
GET /tweets?user_id=42&cursor=01905cce182&limit=20
-- "Give me tweets after tweet ID 01905cce182"
SELECT * FROM tweets
WHERE user_id = 42
AND tweet_id < 01905cce182 -- cursor as WHERE predicate
ORDER BY tweet_id DESC
LIMIT 20;
The query is a range scan from a known position. No matter how many new tweets arrive after the cursor position, the result set is stable. Items are never skipped or duplicated.
The server includes the cursor for the next page in every response:
GET /tweets?user_id=42&limit=20
Response: {
"tweets": [...20 items...],
"next_cursor": "01905cce182", // last tweet_id from this page
"has_more": true
}
The client passes next_cursor as the cursor on the next request. When has_more is false or next_cursor is absent, the list is exhausted.
What makes a good cursor
The cursor must encode enough information to deterministically locate the boundary between one page and the next. The most common choices:
Primary key as cursor. When the primary key is time-sortable (Snowflake ID, ULID, UUID v7), the ID itself encodes the sort position. The cursor is just the last-seen ID, and the WHERE clause is WHERE id < last_id.
-- Primary key is a Snowflake ID; ID encodes time, so ORDER BY id = ORDER BY time
SELECT * FROM tweets WHERE user_id = ? AND tweet_id < ? ORDER BY tweet_id DESC LIMIT 20
-- ^^^^^^^^^^^^^^^^^^^
-- cursor used as WHERE predicate, not OFFSET
Composite cursor. When sorting by a non-unique column (e.g. created_at, which two simultaneous writes can share the same value), the cursor must encode both the sort column value and the tie-breaking primary key.
// Composite cursor: encode (created_at, tweet_id) together
cursor = base64_encode(f"{last_tweet.created_at}_{last_tweet.tweet_id}")
// Decode and use in the WHERE clause
(cursor_ts, cursor_id) = decode(cursor)
WHERE (created_at < cursor_ts) OR (created_at = cursor_ts AND tweet_id < cursor_id)
Opaque to the client. Always base64-encode or otherwise serialize the cursor before returning it. Clients should treat cursors as opaque tokens, not parse them. This lets you change cursor internals without breaking clients, and prevents clients from constructing arbitrary cursors that bypass access controls.
Opaque cursors are a security boundary, not just an API convention. If a cursor directly encodes user_id=42&offset=100, a client can forge a cursor for a different user's data. Base64-encoding does not add cryptographic security by itself. For sensitive resources, sign the cursor with an HMAC using a server-side secret so server-side validation can detect tampering before executing the query.
The performance benefit
Offset pagination requires the database to read and discard N rows on every request. On page 500 with OFFSET 10000 LIMIT 20, the database finds 10,020 rows and throws away 10,000 of them. At thousands of requests per second, this is measurable waste.
Cursor-based pagination with a proper index does a single B-tree descent to the cursor position, then reads forward 20 rows. The cost is O(log N) for the descent plus O(limit) for the scan, regardless of how deep into the result set the cursor is.
-- Offset: reads and discards 10,000 rows on every "load more"
SELECT * FROM tweets ORDER BY created_at DESC LIMIT 20 OFFSET 10000;
-- Cursor: one index seek + 20 row scan, always O(1) distance from cursor
SELECT * FROM tweets WHERE tweet_id < ? ORDER BY tweet_id DESC LIMIT 20;
The index requirement is strict: the columns in the WHERE clause and ORDER BY must be covered by a single index. For the pattern above, (user_id, tweet_id DESC) as a composite index makes the per-user timeline query a single-shard, single-index range scan.
The index must exactly match the cursor query. For WHERE user_id = ? AND tweet_id < ?, create (user_id, tweet_id DESC). If you sort by a non-time-sortable column like score, you need (user_id, score DESC, tweet_id DESC) as a composite index covering both the sort column and the tiebreaker. Cursor pagination without the right index silently falls back to a full sequential scan, which is worse than OFFSET at scale.
Bidirectional pagination
Infinite scroll that loads older content when scrolling down and newer content when scrolling up needs cursors in both directions: a before cursor (for going backward to newer items) and an after cursor (for going forward to older items).
GET /messages?conversation_id=99&before=msg_id_A&limit=20
-- Load 20 messages before (newer than) message A
GET /messages?conversation_id=99&after=msg_id_B&limit=20
-- Load 20 messages after (older than) message B
-- "Before" direction: messages newer than the cursor
SELECT * FROM messages
WHERE conversation_id = 99 AND message_id > :before_cursor
ORDER BY message_id ASC
LIMIT 20;
-- "After" direction: messages older than the cursor
SELECT * FROM messages
WHERE conversation_id = 99 AND message_id < :after_cursor
ORDER BY message_id DESC
LIMIT 20;
Return both cursors in responses when the dataset has items in both directions from the current window.
When offset pagination is acceptable
Offset pagination is fine when:
- The dataset is effectively static (reference data, archived records that never change).
- The user never needs to page beyond a small depth (page 1-3 of search results where most users never go past page 2).
- Exact reproducibility across pages is not required (internal admin tools, analytics dashboards).
For any user-facing feed or list where content changes during browsing, cursor-based pagination is the correct default.
Limitations and when NOT to use it
- No random access. You cannot jump to an arbitrary page. "Show me page 47" is not possible with cursors; you can only move forward from the current position or backward with a reverse cursor. Interfaces that show discrete page numbers must use offset pagination.
- Total count requires a separate expensive query.
SELECT COUNT(*) WHERE filters...scans all matching rows. For 10 million rows, this costs more than the page query itself. Cursor-based APIs should returnhas_more: true/falseinstead of total counts. - Cursor invalidation on sort order changes. If the sort field changes type or the table is reindexed, old cursors may produce inconsistent results. Cursor tokens should be treated as short-lived (not stored across schema migrations).
- Backward pagination is non-trivial. Fetching newer items (scrolling up in a chat) requires a dedicated reverse cursor (
before) and a separate SQL direction. Some teams skip this and use "scroll to top to refresh" instead. - Sorting on multiple non-unique columns requires composite cursors. A sort on
(category, score, id)requires encoding all three fields in the cursor and a multi-column WHERE predicate. The SQL grows in complexity with each additional sort key.
Production usage
| System | Usage | Notable behavior |
|---|---|---|
| Twitter/X API v2 | next_token parameter for timeline pagination | Opaque cursor returned in meta.next_token; WHERE tweet_id < :cursor ORDER BY tweet_id DESC internally |
| GitHub REST API | Link header with rel=next cursor URLs | Uses opaque URL cursors encoding the last seen ID and sort parameters |
| Stripe API | starting_after and ending_before object ID parameters | Clients pass the last object ID from current page; server uses WHERE id > :cursor keyset filter |
| PostgreSQL | Keyset pagination with WHERE (col1, col2) > (:v1, :v2) | Row value constructor syntax enables multi-column cursor comparisons in a single expression |
Interview cheat sheet
- State cursor pagination as your default immediately after showing any API endpoint that returns a list.
- Offset pagination breaks under concurrent inserts: new items push existing items to later offsets, causing duplicates on page boundaries.
- A cursor is a WHERE predicate, not a SKIP count. The query locates the boundary, then scans forward.
- Time-sortable IDs (Snowflake, UUID v7) make primary key the correct cursor field. No separate created_at index needed.
- For non-unique sort columns, encode both the sort value and the tiebreaking primary key into the cursor.
- Always opacify cursors with base64 encoding. Clients should not parse cursor internals.
- Cursor pagination requires an index matching the WHERE and ORDER BY columns exactly. Design the index before designing the pagination.
- Offset pagination on deep pages (OFFSET 10000) scans and discards thousands of rows. Cursor pagination is O(log N) regardless of depth.
Quick recap
- Offset pagination skips N rows before returning results. It produces duplicates and skips when rows are inserted or deleted while a user is browsing.
- Cursor pagination encodes the last seen row's identity as a WHERE predicate. It is O(log N) regardless of how deep into the results the reader is.
- Cursors must be opaque to clients (base64-encoded). The internal structure encodes whatever columns define the sort boundary.
- For non-unique sort order columns, the cursor must include a tiebreaker (typically the primary key) to handle rows with identical sort values.
- The database index must match both the WHERE filter columns and the ORDER BY direction for cursor pagination to use an index scan instead of a table scan.
- For bidirectional pagination (chat-style infinite scroll), return both a
prev_cursorandnext_cursorand support both<and>range directions.
Related concepts
- Distributed ID Generation โ Time-sortable IDs (Snowflake, UUID v7) make the primary key the natural cursor field. This eliminates the need for a separate
created_atindex for sort order. - Databases โ Understanding how the database executes
LIMIT OFFSET(scans and discards rows) versus a range scan (navigates directly to the boundary) explains why cursor pagination is faster at depth. - API Gateway โ Pagination conventions are part of API contract design. Cursor-based APIs are a signal of thoughtful API design that handles production-scale data correctly.