Cursor-based pagination

The problem with offset pagination

Offset pagination is the natural first instinct. Give the client a page number or an offset count. On each request, skip that many rows and return the next batch.

GET /tweets?user_id=42&offset=20&limit=20
-- "Give me tweets 21-40"

SELECT * FROM tweets WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20 OFFSET 20;

This works correctly on a static dataset. It breaks the moment anything is inserted or deleted while a user is paging through results.

Consider a user's timeline with 100 tweets ordered newest-first. The client has loaded the first page (tweets 1-20). While they read, three new tweets arrive at the top. The second page request with OFFSET 20 now returns tweets 21-40 from the original numbering. But because three new tweets shifted everything down by 3, the second page actually returns tweets 18-37 from the user's perspective, repeating tweets 18-20 from the first page and skipping nothing.

The same problem happens on deletes: items are silently skipped.

I see this described as a niche problem in interviews. It is not. Any social feed, message list, order history, or search result where items arrive while the user browses suffers from this. It is the correct default for almost every paginated API.

What cursor pagination is

Cursor-based pagination (also called keyset pagination) anchors each page request to a specific row identity rather than a positional offset. The server returns an opaque token called a cursor encoding the last item the client received; subsequent requests attach this token and the server translates it into a WHERE predicate to fetch the next batch.

Think of it like a bookmark in a book. You do not say "give me page 87." You put a bookmark after the last sentence you read and say "start from here." New chapters added before your bookmark do not move it; it stays anchored to the content you last read.

The cursor is the only mechanism that guarantees stable, consistent pages in a dataset that changes between requests. Every major social feed, messaging API, and activity stream uses this approach.

How cursor-based pagination works

Instead of "skip N rows", the client tells the server "give me everything after this specific record". The cursor is an opaque token encoding the last item the client received. The server uses it as a WHERE clause filter, not an OFFSET.

GET /tweets?user_id=42&cursor=01905cce182&limit=20
-- "Give me tweets after tweet ID 01905cce182"

SELECT * FROM tweets
WHERE user_id = 42
  AND tweet_id < 01905cce182   -- cursor as WHERE predicate
ORDER BY tweet_id DESC
LIMIT 20;

The query is a range scan from a known position. No matter how many new tweets arrive after the cursor position, the result set is stable. Items are never skipped or duplicated.

The server includes the cursor for the next page in every response:

GET /tweets?user_id=42&limit=20
Response: {
  "tweets": [...20 items...],
  "next_cursor": "01905cce182",  // last tweet_id from this page
  "has_more": true
}

The client passes next_cursor as the cursor on the next request. When has_more is false or next_cursor is absent, the list is exhausted.

What makes a good cursor

The cursor must encode enough information to deterministically locate the boundary between one page and the next. The most common choices:

Primary key as cursor. When the primary key is time-sortable (Snowflake ID, ULID, UUID v7), the ID itself encodes the sort position. The cursor is just the last-seen ID, and the WHERE clause is WHERE id < last_id.

-- Primary key is a Snowflake ID; ID encodes time, so ORDER BY id = ORDER BY time
SELECT * FROM tweets WHERE user_id = ? AND tweet_id < ? ORDER BY tweet_id DESC LIMIT 20
--                                                  ^^^^^^^^^^^^^^^^^^^
--                                         cursor used as WHERE predicate, not OFFSET

Composite cursor. When sorting by a non-unique column (e.g. created_at, which two simultaneous writes can share the same value), the cursor must encode both the sort column value and the tie-breaking primary key.

// Composite cursor: encode (created_at, tweet_id) together
cursor = base64_encode(f"{last_tweet.created_at}_{last_tweet.tweet_id}")

// Decode and use in the WHERE clause
(cursor_ts, cursor_id) = decode(cursor)
WHERE (created_at < cursor_ts) OR (created_at = cursor_ts AND tweet_id < cursor_id)

Opaque to the client. Always base64-encode or otherwise serialize the cursor before returning it. Clients should treat cursors as opaque tokens, not parse them. This lets you change cursor internals without breaking clients, and prevents clients from constructing arbitrary cursors that bypass access controls.

Opaque cursors are a security boundary, not just an API convention. If a cursor directly encodes user_id=42&offset=100, a client can forge a cursor for a different user's data. Base64-encoding does not add cryptographic security by itself. For sensitive resources, sign the cursor with an HMAC using a server-side secret so server-side validation can detect tampering before executing the query.

The performance benefit

Offset pagination requires the database to read and discard N rows on every request. On page 500 with OFFSET 10000 LIMIT 20, the database finds 10,020 rows and throws away 10,000 of them. At thousands of requests per second, this is measurable waste.

The problem with offset pagination

Offset pagination is the natural first instinct. Give the client a page number or an offset count. On each request, skip that many rows and return the next batch.

GET /tweets?user_id=42&offset=20&limit=20
-- "Give me tweets 21-40"

SELECT * FROM tweets WHERE user_id = 42
ORDER BY created_at DESC
LIMIT 20 OFFSET 20;

This works correctly on a static dataset. It breaks the moment anything is inserted or deleted while a user is paging through results.

The same problem happens on deletes: items are silently skipped.

What cursor pagination is

The cursor is the only mechanism that guarantees stable, consistent pages in a dataset that changes between requests. Every major social feed, messaging API, and activity stream uses this approach.

How cursor-based pagination works

GET /tweets?user_id=42&cursor=01905cce182&limit=20
-- "Give me tweets after tweet ID 01905cce182"

SELECT * FROM tweets
WHERE user_id = 42
  AND tweet_id < 01905cce182   -- cursor as WHERE predicate
ORDER BY tweet_id DESC
LIMIT 20;

The query is a range scan from a known position. No matter how many new tweets arrive after the cursor position, the result set is stable. Items are never skipped or duplicated.

The server includes the cursor for the next page in every response:

GET /tweets?user_id=42&limit=20
Response: {
  "tweets": [...20 items...],
  "next_cursor": "01905cce182",  // last tweet_id from this page
  "has_more": true
}

The client passes next_cursor as the cursor on the next request. When has_more is false or next_cursor is absent, the list is exhausted.

What makes a good cursor

The cursor must encode enough information to deterministically locate the boundary between one page and the next. The most common choices:

-- Primary key is a Snowflake ID; ID encodes time, so ORDER BY id = ORDER BY time
SELECT * FROM tweets WHERE user_id = ? AND tweet_id < ? ORDER BY tweet_id DESC LIMIT 20
--                                                  ^^^^^^^^^^^^^^^^^^^
--                                         cursor used as WHERE predicate, not OFFSET

// Composite cursor: encode (created_at, tweet_id) together
cursor = base64_encode(f"{last_tweet.created_at}_{last_tweet.tweet_id}")

// Decode and use in the WHERE clause
(cursor_ts, cursor_id) = decode(cursor)
WHERE (created_at < cursor_ts) OR (created_at = cursor_ts AND tweet_id < cursor_id)

Cursor-based pagination

What makes a good cursor

The performance benefit

Continue Reading with Premium

Comments

Cursor-based pagination

What makes a good cursor

The performance benefit

Continue Reading with Premium

Comments