Trie (prefix tree)
How tries provide O(k) key lookup, insertion, and prefix search, powering autocomplete, IP routing tables, DNS resolvers, and dictionary implementations.
The problem
Your search bar has 50 million previously searched queries. A user types "app" and expects instant autocomplete suggestions. You try the obvious approach: store all queries in a hash map and, on each keystroke, iterate through all 50 million keys checking if they start with the typed prefix. That is O(N) per keystroke, and at 50 million keys, each keypress takes hundreds of milliseconds. The autocomplete feels sluggish.
You could pre-sort the queries and binary search for the prefix, getting O(log N) to find the first match plus O(R) to collect R results. That helps, but you still need to re-search from scratch on every keystroke, and the sorted array has no structural awareness of shared prefixes. "apple", "application", and "app store" all share the "app" prefix, but the sorted array treats them as unrelated entries.
You need a data structure where looking up all strings that share a prefix costs O(k) (where k is the prefix length), not O(N) or O(log N). You need a data structure that models prefix sharing as its fundamental structure. This is what a trie does.
What it is
A trie (prefix tree) is a tree data structure where each node represents a single character in a string, and paths from root to marked nodes form complete keys. All keys that share a common prefix share the same nodes for that prefix.
Think of it like a library filing system where books are organized by call number. "QA76.73" and "QA76.76" share the same shelf path up to "QA76.7", then branch into different slots. You do not search the entire library to find all books starting with "QA76". You walk directly to that shelf and look at what is there. The structure itself encodes the prefix relationship.
How it works
Each path from root to a node marked with ā is a complete key. The critical insight: all words sharing a prefix share the same nodes for that prefix. "app", "apple", and "apply" all traverse the same root-a-p-p path before branching. This means prefix search is not a scan, it is a direct tree traversal.
Core operations
All operations are O(k) where k is the key length, independent of how many strings are stored in the trie. This is fundamentally different from hash tables (O(1) average but O(N) prefix search) and balanced BSTs (O(k log N) for string keys).
Lookup: traverse character by character from root. If you reach the end of the key and the node is marked as end-of-word, the key exists. O(k).
Insert: traverse to where the path diverges, create new nodes for the remaining characters. O(k).
Prefix search: traverse to the prefix node, then collect all descendants that are end-of-word nodes.
class TrieNode:
def __init__(self):
self.children = {} # char -> TrieNode
self.is_end = False # marks complete key
def insert(root, word):
node = root
for char in word:
if char not in node.children:
node.children[char] = TrieNode()
node = node.children[char]
node.is_end = True
def search(root, word):
node = root
for char in word:
if char not in node.children:
return False
node = node.children[char]
return node.is_end
def starts_with(root, prefix):
node = root
for char in prefix:
if char not in node.children:
return []
node = node.children[char]
results = []
def collect(node, current_word):
if node.is_end:
results.append(current_word)
for char, child in node.children.items():
collect(child, current_word + char)
collect(node, prefix)
return results
starts_with("ap") traverses root -> a -> p in O(2) steps, then collects all descendants: ["app", "apple", "apply", "apt"]. The work to find the prefix node is O(k). The work to collect results is O(R) where R is the number of matching strings, giving O(k + R) total.
Autocomplete with top-K precomputation
Search autocomplete (Google, GitHub search, Slack) uses tries with frequency-weighted nodes. The naive approach of traversing the entire subtree on each keystroke is too slow for production. Instead, production systems precompute the top-K suggestions at every node during index build time.
Each node stores a precomputed list of the most popular completions reachable from that prefix. When a user types "app", traversal reaches the "app" node in O(3) steps and immediately returns the stored top-K list. No subtree traversal required.
The tradeoff: storage increases because every node carries a top-K list. For K=10 suggestions with average 20-byte strings, that is ~200 bytes per node. With millions of nodes, this adds up. But the query-time gain (O(k) instead of O(k + entire subtree)) is worth it for interactive autocomplete where latency budgets are under 50ms.
Interview tip: mention the top-K precomputation
When autocomplete comes up in a system design interview, do not just say "use a trie." Say "use a trie with precomputed top-K results at each node, built offline during index construction. This gives O(k) query time without subtree traversal." That level of detail separates strong answers from generic ones.
IP routing: longest prefix match
Network routers need to find the most specific routing rule for an incoming packet's destination IP. This is called Longest Prefix Match (LPM), and it is one of the most performance-critical operations in networking.
IP routing tries operate on individual bits rather than characters. Each bit of the IP address determines left (0) or right (1) at each node. CIDR notation (/24, /16, /8) specifies the prefix length where a rule matches.
As the packet for 10.1.1.5 traverses the trie, it passes through matching nodes at /8 (Interface A), /16 (Interface B), and /24 (Interface C). The longest match (/24) wins, so the packet routes to Interface C. This is completed in exactly 24 bit comparisons, regardless of how many routes exist in the table.
Production routers use optimized variants: Patricia tries (collapse single-child chains), multi-bit tries (examine 4-8 bits per step instead of 1), and TCAM hardware (ternary content-addressable memory for parallel prefix matching at wire speed). The Linux kernel's FIB (Forwarding Information Base) uses an LC-trie (Level-Compressed trie), which combines path compression with level compression for cache-friendly lookups.
Binary tries for IP routing become deep (up to 128 levels for IPv6). Naive implementations cause cache misses at every level because each node may be in a different cache line. Production routers use multi-bit tries (stride of 4-8 bits) to reduce depth at the cost of wider nodes, trading memory for fewer cache misses per lookup.
Memory optimization: compressed tries
Standard tries waste memory for sparse paths. If a node has only one child, that node exists solely to extend the path by one character. Compressed tries (radix trees, Patricia tries) merge chains of single-child nodes into a single node storing the entire substring.
For the word "banana" alone, a standard trie uses 6 nodes. A compressed trie uses 1 node storing the full string. When multiple words share a prefix, the compression applies to the non-shared suffixes:
| Trie type | "banana" + "band" | Node count |
|---|---|---|
| Standard | root-b-a-n-a-n-a, root-b-a-n-d | 8 nodes |
| Compressed | root-"ban"-["ana", "d"] | 4 nodes |
Trie node representations
The children map in each trie node determines memory usage and lookup speed:
| Representation | Lookup | Memory per node | Best when |
|---|---|---|---|
| Array[256] | O(1) | 2 KB (256 pointers) | Small alphabet, dense nodes (binary tries) |
| HashMap | O(1) avg | ~100-200 bytes overhead | Medium alphabet, sparse nodes |
| Sorted array | O(log A) | Compact, A * pointer_size | Memory-constrained, infrequent lookups |
| Bitmap + compact array | O(1) | Optimal for sparse | Production-grade (ART, HAT-trie) |
For tries over ASCII strings (alphabet size 256), using a fixed array per node wastes enormous memory because most nodes have only a few children. Hash maps add overhead per node. The Adaptive Radix Tree (ART) solves this by choosing the node representation based on how many children exist: Node4, Node16, Node48, or Node256. This gives near-optimal memory at every density level.
Linux kernel's routing table uses an LC-trie (Level-Compressed trie). Redis uses radix trees internally for its Streams data structure. Go's net/http default router uses a custom radix tree for URL path matching.
Hash table vs trie: when trie wins
| Operation | Hash table | Trie |
|---|---|---|
| Exact lookup | O(1) average | O(k) |
| Prefix search | O(N) (scan all keys) | O(k + results) |
| Lexicographic ordering | Not supported | Natural (DFS traversal) |
| Shared prefix memory | None (each key stored fully) | Shared nodes for common prefixes |
| Worst case | O(N) collisions | O(k) always |
| Delete | O(1) average | O(k), may need pruning |
| Range query | Not supported | O(k + range size) via DFS |
Use a hash table when you need pure exact-match lookups and do not care about prefix relationships. Use a trie when you need prefix queries, sorted iteration, longest-prefix matching, or when your keys share significant common prefixes.
I have seen engineers default to hash maps for every string lookup problem. That works fine until someone asks for autocomplete or prefix filtering, and the hash map requires a full scan. Picking the right data structure upfront saves a rewrite later.
Production usage
| System | Usage | Notable behavior |
|---|---|---|
| Google Search / Bing | Autocomplete suggestions | Distributed tries with top-K precomputed at each node. Sharded by prefix range across servers. |
| Linux kernel (FIB) | IP routing table (LC-trie) | Level-Compressed trie for cache-friendly longest-prefix-match. Handles full Internet routing table (~900K IPv4 prefixes). |
| DNS resolvers | Domain name lookup | Domain labels form a trie: "www.example.com" traverses com -> example -> www. Caching resolvers use trie structures for prefix matching. |
| Redis Streams | Internal radix tree for stream entry IDs | Uses a radix tree (compressed trie) to store and retrieve stream entries by their millisecond-sequence IDs. |
| Apache Lucene / Elasticsearch | Term dictionary (FST) | Finite State Transducer, a compressed trie variant, stores the term dictionary. Maps every indexed term to its posting list offset in O(k). |
Go net/http | URL path routing | Default HTTP mux uses a radix tree for path matching. Frameworks like chi and echo use optimized radix trees for route matching. |
Limitations and when NOT to use it
- Memory-hungry for large alphabets. A standard trie node over Unicode (millions of possible characters) is impractical with array-based children. Even with hash map children, each node carries overhead (map header, hash buckets). For pure exact-match workloads, a hash table uses less memory.
- Cache-unfriendly traversal. Each character in the key requires following a pointer to a different node, likely in a different cache line. For short keys, a hash table (one hash computation, one or two memory accesses) is faster despite higher algorithmic complexity for prefix search.
- Not useful without prefix queries. If your workload is 100% exact-match lookups with no prefix search, sorted iteration, or range queries, a trie adds complexity and memory overhead with no benefit. Use a hash table.
- Balancing is not applicable. Unlike BSTs, tries cannot be "balanced." A pathological input (very long keys with no shared prefixes) produces a deep, sparse tree that wastes memory. Compressed tries mitigate this but add implementation complexity.
- Concurrent modification is complex. Lock-free tries require careful design (e.g., the Concurrent Trie by Prokopec et al.). Naive per-node locking creates contention on popular prefixes. For write-heavy workloads, concurrent hash maps are simpler to implement correctly.
- Serialization is non-trivial. A trie's pointer-based structure does not serialize naturally to disk. Production systems that need persistent tries use memory-mapped arrays or specialized formats (FST in Lucene). This is more engineering effort than a flat key-value store.
Interview cheat sheet
- When autocomplete comes up: say "Use a trie with precomputed top-K results at each node. Query time is O(k) where k is the prefix length. Build the top-K lists offline during index construction."
- When IP routing is discussed: say "Longest Prefix Match via binary trie. Each bit is a trie level. Production routers use multi-bit tries or TCAM to reduce depth and cache misses."
- When asked about prefix search: say "Hash tables cannot do prefix search without scanning all keys (O(N)). Tries do it in O(k + R) where R is the result count. This is the fundamental difference."
- When comparing with hash tables: say "Tries win for prefix queries, sorted iteration, and range queries. Hash tables win for pure exact-match O(1) lookups. Choose based on your query pattern."
- When memory is a concern: say "Use a compressed trie (radix tree) to merge single-child chains. Or use an Adaptive Radix Tree (ART) that picks node size based on child count."
- When DNS resolution is discussed: say "DNS is a trie. Domain labels (com, example, www) form a hierarchical trie. Resolvers traverse it right-to-left (TLD first)."
- When someone proposes regex for prefix matching: say "A trie is essentially a finite automaton for prefix matching. It is faster and more memory-efficient than compiling a regex per query."
- When sorted output is needed: say "A DFS traversal of a trie produces keys in lexicographic order. No sorting step needed. Hash tables cannot provide this without an external sort."
Quick recap
- A trie stores strings character by character, sharing nodes for common prefixes. All operations (insert, lookup, prefix search) are O(k) where k is the key length, independent of the total number of stored strings.
- Autocomplete systems precompute top-K suggestions at each trie node during index build, enabling O(k) prefix query time without subtree traversal.
- IP routing uses binary tries for Longest Prefix Match, with multi-bit strides and path compression to reduce depth and cache misses in production routers.
- Compressed tries (Patricia tries, radix trees) merge single-child chains, reducing node count by 60-80% for sparse datasets.
- Tries win over hash tables for prefix queries, sorted iteration, and range queries. Hash tables win for pure exact-match O(1) lookups.
- Memory is the main drawback: naive tries can use 10x more memory than hash tables for the same keys. Compressed variants and adaptive node representations (ART) mitigate this significantly.
Related concepts
- Inverted index - Lucene's term dictionary is an FST (a compressed trie variant). Understanding tries explains how term lookup achieves O(k) performance in full-text search engines.
- Bloom filters - Bloom filters answer "does this key exist?" probabilistically in O(1). Tries answer the same question deterministically in O(k), plus support prefix queries. Use bloom filters to guard expensive trie lookups.
- Databases - B-trees and tries solve overlapping problems (sorted key lookup, range queries). B-trees are disk-optimized with high fan-out; tries are memory-optimized for prefix-heavy workloads.