Skip lists

The problem

You need a data structure that stores scores and names together, supports O(log N) insert and delete, and answers "give me all names with scores between 100 and 500" in O(log N + k) time. A sorted array gives you O(log N) search but O(N) insert. A linked list gives you O(1) insert at a known position but O(N) search. A balanced binary search tree (AVL or red-black) gives you O(log N) for all three, but it requires rotations on every insert and delete to maintain balance. Those rotations are an implementation challenge: dozens of cases, pointer surgery, easy to get wrong.

Redis needed a sorted set structure when building ZADD/ZRANGE. The balanced BST option works, but the rotation logic is complex enough that a bug means silent data corruption. Rotations also touch multiple nodes, hurting cache locality under concurrent access.

This is the problem skip lists solve: logarithmic performance without rebalancing, and a dramatically simpler implementation compared to any balanced tree.

What a skip list is

A skip list is a probabilistic data structure built by layering multiple linked lists on top of each other. The bottom layer (level 0) is a complete sorted linked list containing every element. Each element is promoted to level 1 with probability 0.5, to level 2 with probability 0.25, and so on. Higher levels contain progressively fewer nodes and act as "express lanes" that let searches skip over many elements at once.

Analogy: Imagine a highway system with multiple parallel lanes. The rightmost lane is a local road with every exit. The lane to its left skips some exits. The leftmost lane is an express highway with only major exits. To reach a destination, you start in the express lane and move to slower lanes as you get close. A skip list is exactly this structure, built from coin flips.

How it works

The structure

Search

To find key 60:

Start at the top level (level 2). Head points to 1, then 50, then 100. 50 is less than 60 but 100 is greater. Drop to level 1.
At level 1, starting from 50. Next is 75. 75 is greater than 60. Drop to level 0.
At level 0, starting from 50. Next is 60. Found it.

Total comparisons: 6 instead of 10 (a full sequential scan of all ten elements between head and 60).

function search(list, target):
  node = list.head
  for level from MAX_LEVEL down to 0:
    while node.forward[level] != null
        and node.forward[level].key < target:
      node = node.forward[level]
  node = node.forward[0]
  if node != null and node.key == target:
    return node.value
  return NOT_FOUND

Insert

To insert key 55:

Run the search procedure, recording the last node visited at each level before each drop. These become the "update" nodes.
Flip coins to determine the height of the new node. Each flip is heads with probability 0.5.
Splice the new node in at every level up to its height, updating forward pointers.

function insert(list, key, value):
  update = array of size MAX_LEVEL
  node = list.head
  for level from MAX_LEVEL down to 0:
    while node.forward[level] != null
        and node.forward[level].key < key:
      node = node.forward[level]
    update[level] = node

  new_height = random_level()   # flip coins until tails
  new_node = Node(key, value, new_height)
  for level from 0 to new_height:
    new_node.forward[level] = update[level].forward[level]
    update[level].forward[level] = new_node

function random_level():
  level = 0
  while random() < 0.5 and level < MAX_LEVEL:
    level += 1
  return level

Delete follows the same pattern: collect the update array via search, then unlink the target node from every level it appears in by forwarding the update nodes to the target's forward pointers.

Why Redis uses skip lists for sorted sets

Redis sorted sets (ZSET) need to support five operations efficiently: ZADD (insert), ZREM (delete), ZSCORE (point lookup), ZRANGEBYSCORE (range scan), and ZRANK (rank query: "what position is this element?").

A balanced BST supports the first four in O(log N). Rank queries require augmenting every node with its subtree size and updating that count on every rotation -- doable in theory but adds significant complexity to every structural change. A skip list supports rank queries natively by storing a "span" value at every level that records how many level-0 nodes each forward pointer skips over. ZRANK is then: sum the spans along the search path from head to the target.

Redis chose the skip list because:

The forward-span trick makes ZRANK O(log N) with no extra overhead on insert or delete.
No rotations means the implementation is roughly 200 lines of C versus roughly 800 for a red-black tree.
Range scans at level 0 are simple pointer following with no tree traversal or stack required.

No rotations means no multi-node locking on structural changes

Red-black tree rotations touch two to three nodes simultaneously. In a concurrent setting, that requires locking multiple nodes at once or a complex lock-free rotation algorithm. Skip list inserts and deletes update only the new node and its update-array predecessors, each with a single pointer swap. This makes fine-grained locking far simpler. Java's ConcurrentSkipListMap uses exactly this property to implement a non-blocking sorted map. Redis, being single-threaded, benefits from simpler code and fewer bugs rather than lock performance.

Redis actually pairs a skip list with a hash table for ZSET. The hash table handles O(1) ZSCORE lookups; the skip list handles ordering and ranges.

The span values stored on each forward pointer are what make ZRANK a single-pass operation. Each pointer records how many level-0 nodes it skips over. ZRANK sums those spans along the search path from head to the target node, with no separate data structure needed.

The problem

This is the problem skip lists solve: logarithmic performance without rebalancing, and a dramatically simpler implementation compared to any balanced tree.

How it works

To find key 60:

Start at the top level (level 2). Head points to 1, then 50, then 100. 50 is less than 60 but 100 is greater. Drop to level 1.

At level 1, starting from 50. Next is 75. 75 is greater than 60. Drop to level 0.

At level 0, starting from 50. Next is 60. Found it.

Total comparisons: 6 instead of 10 (a full sequential scan of all ten elements between head and 60).

function search(list, target):
  node = list.head
  for level from MAX_LEVEL down to 0:
    while node.forward[level] != null
        and node.forward[level].key < target:
      node = node.forward[level]
  node = node.forward[0]
  if node != null and node.key == target:
    return node.value
  return NOT_FOUND

To insert key 55:

Run the search procedure, recording the last node visited at each level before each drop. These become the "update" nodes.

Flip coins to determine the height of the new node. Each flip is heads with probability 0.5.

Splice the new node in at every level up to its height, updating forward pointers.

function insert(list, key, value):
  update = array of size MAX_LEVEL
  node = list.head
  for level from MAX_LEVEL down to 0:
    while node.forward[level] != null
        and node.forward[level].key < key:
      node = node.forward[level]
    update[level] = node

  new_height = random_level()   # flip coins until tails
  new_node = Node(key, value, new_height)
  for level from 0 to new_height:
    new_node.forward[level] = update[level].forward[level]
    update[level].forward[level] = new_node

function random_level():
  level = 0
  while random() < 0.5 and level < MAX_LEVEL:
    level += 1
  return level

Delete follows the same pattern: collect the update array via search, then unlink the target node from every level it appears in by forwarding the update nodes to the target's forward pointers.

Skip lists

The problem

What a skip list is

How it works

The structure

Search

Insert

Why Redis uses skip lists for sorted sets

Continue Reading with Premium

Comments

Skip lists

The problem

What a skip list is

How it works

The structure

Search

Insert

Why Redis uses skip lists for sorted sets

Continue Reading with Premium

Comments