Design a Search Autocomplete System
OOP design for a type-ahead search suggestion system using a Trie data structure. Covers prefix matching, ranking by frequency, top-K results, and real-time suggestion updates.
The Problem
Your search engine handles 10,000 queries per second. Every keystroke fires a request to fetch suggestions, and users expect results within 100 ms. The current implementation runs a SQL LIKE 'prefix%' query against a table of 500 million historical search terms. At peak traffic, the database connection pool is saturated and suggestions stop appearing entirely.
A Trie (prefix tree) solves this by organizing all search terms in a tree where each node represents a character. Traversing from root to any node gives you the prefix, and all terms sharing that prefix live in the same subtree. Lookups are O(L) where L is the prefix length, independent of how many total terms exist.
Design the core classes for a search autocomplete system that supports prefix-based lookup, frequency-ranked suggestions, top-K result caching at each Trie node, and pluggable ranking strategies.
Requirements
Clarifying Questions
Before jumping into class design, ask questions to pin down the scope. Cover four areas: core behavior, configuration, boundaries, and extensibility.
You: "How many suggestions should we return for each prefix?"
Interviewer: "Top 5 by default, but the caller should be able to configure this."
Good. A configurable K means we need to store it in configuration, not hardcode it.
You: "How do we rank suggestions? By search frequency, recency, or something else?"
Interviewer: "Frequency by default. But we want to support different ranking strategies in the future, like recency-based or personalized ranking."
Strategy pattern for ranking. The default sorts by frequency count, but we keep the door open for other approaches.
You: "Should the system handle case sensitivity? Is 'Apple' the same as 'apple'?"
Interviewer: "Case-insensitive. Normalize everything to lowercase."
Simple rule: lowercase all input before inserting or searching. This avoids duplicates and simplifies the Trie structure.
You: "How do search terms get into the system? Real-time as users search, or batch-loaded?"
Interviewer: "Both. Completed searches update frequencies in real-time. We also bulk-load historical data at startup."
Two ingestion paths: recordQuery() for live updates and a bulk-load constructor. The Trie needs to handle both.
You: "Do we need to handle special characters, spaces, or multi-word queries?"
Interviewer: "Yes. Search terms can contain spaces, numbers, and basic punctuation. Treat each character as a Trie node."
No alphabet restriction. The children map in each TrieNode uses Map<Character, TrieNode> rather than a fixed-size array.
You: "Should the system be thread-safe for concurrent reads and writes?"
Interviewer: "Yes. Reads happen on every keystroke, writes happen when searches complete. Concurrent access is expected."
ReadWriteLock on the Trie. Multiple threads can read simultaneously, but writes acquire an exclusive lock.
You: "Is there a maximum length for search terms?"
Interviewer: "Cap at 200 characters. Anything longer gets truncated."
Reasonable limit. It prevents degenerate inputs from creating extremely deep Trie branches.
Final Requirements
Functional Requirements:
- Search for top-K suggestions matching a given prefix using
search(prefix). - Record a completed search query to update frequency using
recordQuery(query). - Rank suggestions by frequency count (highest first).
- Support pluggable ranking strategies (frequency, recency, personalized).
- Pre-compute and cache top-K suggestions at each Trie node for fast retrieval.
- Normalize all input to lowercase for case-insensitive matching.
Non-Functional Requirements:
- Thread-safe for concurrent read and write operations.
- Sub-millisecond prefix lookups for cached results.
- Extensible for new ranking strategies without modifying existing code.
Out of Scope:
- Persistence / database integration
- Fuzzy matching or typo correction
- UI rendering
- Network layer / API endpoints
- Distributed system concerns (sharding across nodes)
Example Inputs and Outputs
Scenario 1: Basic prefix search
- Input: System contains "apple" (freq: 100), "application" (freq: 80), "app store" (freq: 60), "apex legends" (freq: 40), "appetizer" (freq: 20). Search prefix "app".
- Expected:
search("app")returns["apple", "application", "app store", "apex legends", "appetizer"](top 5 by frequency). - Why: Validates basic prefix matching and frequency-based ranking.
Scenario 2: Recording a query updates ranking
- Input: Same terms as above. Call
recordQuery("appetizer")100 times. - Expected:
search("app")now returns["appetizer", "apple", "application", "app store", "apex legends"]. "appetizer" moved to the top. - Why: Validates that frequency updates propagate to cached top-K results.
Scenario 3: No matches for prefix
- Input: Search prefix "xyz".
- Expected:
search("xyz")returns an empty list. - Why: Validates the system handles missing prefixes gracefully without exceptions.
Try It Yourself
Try it yourself
Before reading the solution, spend 15 minutes sketching the core entities. Think about how each TrieNode stores its children, how you would walk the tree to find all words with a given prefix, and where you would cache top-K results to avoid recomputing them on every keystroke. Compare your approach with the walkthrough below.
Step 1: Identify Core Entities
Start by asking: what are the main "things" in this problem? Look for nouns in your requirements. You need a tree structure to hold prefixes, nodes within that tree, something to represent a suggestion with its metadata, an orchestrator that ties search and updates together, a strategy for ranking, and a cache for hot lookups.
A common mistake is building one massive AutocompleteService class that manages the Trie, handles ranking, caches results, and validates input. That works for a prototype, but it makes ranking impossible to swap and tangles traversal logic with caching logic.
| Entity | Responsibility | Key attributes |
|---|---|---|
| AutocompleteService | Orchestrator. Accepts search() and recordQuery() calls, coordinates the Trie, cache, and ranking. | trie, cache, rankingStrategy, maxResults |
| Trie | The prefix tree. Owns insertion and prefix traversal. | root (TrieNode), readWriteLock |
| TrieNode | Single node in the Trie. Stores children, end-of-word flag, frequency, and cached top-K suggestions. | children, isEndOfWord, frequency, topSuggestions |
| Suggestion | Value object pairing a search term with its metadata (frequency, last searched timestamp). | term, frequency, lastSearchedAt |
| RankingStrategy | Strategy interface. Compares two Suggestions to determine order. | compare(a, b) |
| SuggestionCache | LRU cache mapping prefixes to pre-computed suggestion lists. Avoids Trie traversal for hot prefixes. | cache (LinkedHashMap), capacity |
Notice that TrieNode is separate from Trie. The node is a data structure holding character-level state. The Trie manages tree-level operations like insertion, traversal, and locking. Merging them would violate SRP because node-level concerns (children, frequency) and tree-level concerns (thread safety, full traversal) have different reasons to change.
Step 2: Define Relationships and Class Design
AutocompleteService (the orchestrator)
This is the entry point callers interact with. It coordinates the Trie for data storage, the cache for fast lookups, and the ranking strategy for ordering results.
Deriving state from requirements:
| Requirement | What AutocompleteService must track |
|---|---|
| "Search for top-K suggestions" | The Trie to search and the max results count |
| "Pluggable ranking strategies" | The current RankingStrategy |
| "Sub-millisecond for cached results" | The SuggestionCache for hot prefixes |
| "Record completed search queries" | Access to the Trie for frequency updates |
Deriving methods from needs:
| Need from requirements | Method |
|---|---|
| "Search for suggestions" | search(prefix) |
| "Record completed searches" | recordQuery(query) |
TrieNode (the building block)
Continue Reading with Premium
Unlock this article and every other in-depth system design guide on the platform with NotesFromSDE Premium.