Trie (prefix tree)

The problem

Your search bar has 50 million previously searched queries. A user types "app" and expects instant autocomplete suggestions. You try the obvious approach: store all queries in a hash map and, on each keystroke, iterate through all 50 million keys checking if they start with the typed prefix. That is O(N) per keystroke, and at 50 million keys, each keypress takes hundreds of milliseconds. The autocomplete feels sluggish.

You could pre-sort the queries and binary search for the prefix, getting O(log N) to find the first match plus O(R) to collect R results. That helps, but you still need to re-search from scratch on every keystroke, and the sorted array has no structural awareness of shared prefixes. "apple", "application", and "app store" all share the "app" prefix, but the sorted array treats them as unrelated entries.

You need a data structure where looking up all strings that share a prefix costs O(k) (where k is the prefix length), not O(N) or O(log N). You need a data structure that models prefix sharing as its fundamental structure. This is what a trie does.

What it is

A trie (prefix tree) is a tree data structure where each node represents a single character in a string, and paths from root to marked nodes form complete keys. All keys that share a common prefix share the same nodes for that prefix.

Think of it like a library filing system where books are organized by call number. "QA76.73" and "QA76.76" share the same shelf path up to "QA76.7", then branch into different slots. You do not search the entire library to find all books starting with "QA76". You walk directly to that shelf and look at what is there. The structure itself encodes the prefix relationship.

How it works

Each path from root to a node marked with ✅ is a complete key. The critical insight: all words sharing a prefix share the same nodes for that prefix. "app", "apple", and "apply" all traverse the same root-a-p-p path before branching. This means prefix search is not a scan, it is a direct tree traversal.

Core operations

All operations are O(k) where k is the key length, independent of how many strings are stored in the trie. This is fundamentally different from hash tables (O(1) average but O(N) prefix search) and balanced BSTs (O(k log N) for string keys).

Lookup: traverse character by character from root. If you reach the end of the key and the node is marked as end-of-word, the key exists. O(k).

Insert: traverse to where the path diverges, create new nodes for the remaining characters. O(k).

Prefix search: traverse to the prefix node, then collect all descendants that are end-of-word nodes.

class TrieNode:
    def __init__(self):
        self.children = {}   # char -> TrieNode
        self.is_end = False  # marks complete key

def insert(root, word):
    node = root
    for char in word:
        if char not in node.children:
            node.children[char] = TrieNode()
        node = node.children[char]
    node.is_end = True

def search(root, word):
    node = root
    for char in word:
        if char not in node.children:
            return False
        node = node.children[char]
    return node.is_end

def starts_with(root, prefix):
    node = root
    for char in prefix:
        if char not in node.children:
            return []
        node = node.children[char]

    results = []
    def collect(node, current_word):
        if node.is_end:
            results.append(current_word)
        for char, child in node.children.items():
            collect(child, current_word + char)
    collect(node, prefix)
    return results

starts_with("ap") traverses root -> a -> p in O(2) steps, then collects all descendants: ["app", "apple", "apply", "apt"]. The work to find the prefix node is O(k). The work to collect results is O(R) where R is the number of matching strings, giving O(k + R) total.

Autocomplete with top-K precomputation

Search autocomplete (Google, GitHub search, Slack) uses tries with frequency-weighted nodes. The naive approach of traversing the entire subtree on each keystroke is too slow for production. Instead, production systems precompute the top-K suggestions at every node during index build time.

Each node stores a precomputed list of the most popular completions reachable from that prefix. When a user types "app", traversal reaches the "app" node in O(3) steps and immediately returns the stored top-K list. No subtree traversal required.

The tradeoff: storage increases because every node carries a top-K list. For K=10 suggestions with average 20-byte strings, that is ~200 bytes per node. With millions of nodes, this adds up. But the query-time gain (O(k) instead of O(k + entire subtree)) is worth it for interactive autocomplete where latency budgets are under 50ms.

Interview tip: mention the top-K precomputation

When autocomplete comes up in a system design interview, do not just say "use a trie." Say "use a trie with precomputed top-K results at each node, built offline during index construction. This gives O(k) query time without subtree traversal." That level of detail separates strong answers from generic ones.

Prefix search: traverse to the prefix node, then collect all descendants that are end-of-word nodes.

class TrieNode:
    def __init__(self):
        self.children = {}   # char -> TrieNode
        self.is_end = False  # marks complete key

def insert(root, word):
    node = root
    for char in word:
        if char not in node.children:
            node.children[char] = TrieNode()
        node = node.children[char]
    node.is_end = True

def search(root, word):
    node = root
    for char in word:
        if char not in node.children:
            return False
        node = node.children[char]
    return node.is_end

def starts_with(root, prefix):
    node = root
    for char in prefix:
        if char not in node.children:
            return []
        node = node.children[char]

    results = []
    def collect(node, current_word):
        if node.is_end:
            results.append(current_word)
        for char, child in node.children.items():
            collect(child, current_word + char)
    collect(node, prefix)
    return results

Autocomplete with top-K precomputation

Interview tip: mention the top-K precomputation

Trie (prefix tree)

The problem

What it is

How it works

Core operations

Autocomplete with top-K precomputation

IP routing: longest prefix match

Continue Reading with Premium

Comments

Trie (prefix tree)

The problem

What it is

How it works

Core operations

Autocomplete with top-K precomputation

IP routing: longest prefix match

Continue Reading with Premium

Comments