642. Design Search Autocomplete System
hardAsked at Hugging FaceDesign a search autocomplete system that ranks completions by historical frequency. Hugging Face asks this because it directly mirrors their Hub model search infrastructure — prefix matching on model names with ranking by download counts requires exactly the Trie + frequency heap composition tested here.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Source citations
Public interview reports confirming this problem appears in Hugging Face loops.
- Glassdoor (2026-Q1)— Multiple Hugging Face SWE onsite reports cite Design Search Autocomplete as a high-signal hard system design + coding hybrid.
- Blind (2025-12)— Hugging Face threads identify this as a flagship hard problem for infrastructure and platform engineering roles, given its direct relevance to the Hub search experience.
Problem
Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed. The returned top 3 hot sentences should be sorted by hot degree (times the sentence has been typed before). If several sentences have the same hot degree, use ASCII-order as the tiebreaker. If fewer than 3 hot sentences exist, return as many as you have. When a user types '#', the system stores the sentence already typed plus add the hot degree of this sentence by 1 and return an empty list.
Constraints
n == sentences.lengthn == times.length1 <= n <= 1001 <= sentences[i].length <= 1001 <= times[i] <= 50c is a lowercase English letter, a blank space, or '#'.Each sentence ends with '#' and will not appear in sentences.The sentence already typed has a length in the range [1, 200].The number of calls to input will be in the range [1, 5000].
Examples
Example 1
AutocompleteSystem(["i love you","island","iroman","i love leetcode"],[5,3,2,2]); input('i'); input(' '); input('a'); input('#');[["i love you","island","i love leetcode"],["i love you","i love leetcode"],[],[]]Explanation: After 'i': top 3 sentences starting with 'i'. After 'i ': top 3 starting with 'i '. After 'i a': none match. After '#': stores 'i a' with frequency 1.
Approaches
1. Trie with frequency map at each node
Build a Trie where each node stores a frequency map of all sentences that pass through it. For each input character, traverse the Trie and return the top 3 from the current node's map.
- Time
- O(n*L) to build, O(L + k log k) per input where k = matching sentences
- Space
- O(n*L)
class TrieNode {
constructor() {
this.children = new Map();
this.sentences = new Map(); // sentence -> count
}
}
class AutocompleteSystem {
constructor(sentences, times) {
this.root = new TrieNode();
this.current = this.root;
this.prefix = '';
for (let i = 0; i < sentences.length; i++) {
this._insert(sentences[i], times[i]);
}
}
_insert(sentence, count) {
let node = this.root;
for (const ch of sentence) {
if (!node.children.has(ch)) node.children.set(ch, new TrieNode());
node = node.children.get(ch);
node.sentences.set(sentence, (node.sentences.get(sentence) || 0) + count);
}
}
input(c) {
if (c === '#') {
this._insert(this.prefix, 1);
this.prefix = '';
this.current = this.root;
return [];
}
this.prefix += c;
if (this.current !== null && this.current.children.has(c)) {
this.current = this.current.children.get(c);
} else {
this.current = null;
}
if (this.current === null) return [];
const candidates = Array.from(this.current.sentences.entries());
candidates.sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]));
return candidates.slice(0, 3).map(e => e[0]);
}
}Tradeoff: O(n*L) space storing sentence sets at every node — space-heavy but gives O(1) traversal per character and O(k log k) sort per query where k is the number of matching sentences. Acceptable for the given constraints.
2. Trie with terminal nodes + DFS prefix search
Build a Trie where only terminal nodes store (sentence, count) pairs. Each input character traverses to the next Trie node; on each non-# input, DFS from the current node to collect all completions, then sort and return top 3.
- Time
- O(n*L) build, O(p + s log s) per input where p = suffix subtree size, s = matching sentences
- Space
- O(n*L)
class TrieNode {
constructor() {
this.children = new Map();
this.sentence = null; // only set for terminal nodes
this.count = 0;
}
}
class AutocompleteSystem {
constructor(sentences, times) {
this.root = new TrieNode();
this.current = this.root;
this.prefix = '';
for (let i = 0; i < sentences.length; i++) {
this._insert(sentences[i], times[i]);
}
}
_insert(sentence, count) {
let node = this.root;
for (const ch of sentence) {
if (!node.children.has(ch)) node.children.set(ch, new TrieNode());
node = node.children.get(ch);
}
node.sentence = sentence;
node.count += count;
}
_dfs(node, results) {
if (node.sentence) results.push([node.sentence, node.count]);
for (const child of node.children.values()) this._dfs(child, results);
}
input(c) {
if (c === '#') {
this._insert(this.prefix, 1);
this.prefix = '';
this.current = this.root;
return [];
}
this.prefix += c;
if (this.current && this.current.children.has(c)) {
this.current = this.current.children.get(c);
} else {
this.current = null;
}
if (!this.current) return [];
const results = [];
this._dfs(this.current, results);
results.sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]));
return results.slice(0, 3).map(r => r[0]);
}
}Tradeoff: More memory-efficient (no sentence sets at every node) but DFS per query is slower for large Tries. The trade-off is space vs. per-query time. Discuss this explicitly with the interviewer.
Hugging Face-specific tips
Hugging Face is the company most likely to care about the practical design implications of this problem. Lead with: 'In production, Hugging Face Hub search stores model names with download counts — the exact (prefix, frequency) structure this problem models.' Discuss which node stores what data (sentence sets vs. terminal-only) and the trade-off explicitly. Ask the interviewer: 'Should I optimize for memory or for query latency?' This shows you think like a product infrastructure engineer, not just a puzzle-solver. Also handle the '#' input correctly — it both commits the current prefix to the Trie AND resets state.
Common mistakes
- Forgetting to reset current pointer and prefix string when '#' is received.
- Not adding the new sentence to the Trie on '#' input — without this, user queries never become suggestions.
- Using < instead of <= for the hot degree tiebreak — ASCII order should only apply when counts are equal.
- Not handling the case where the current Trie node becomes null mid-prefix — all subsequent non-# inputs should return [] until the next '#' reset.
Follow-up questions
An interviewer at Hugging Face may pivot to one of these next:
- How would you persist and load this Trie from disk for a production search system?
- How would you rank completions by recency in addition to frequency?
- Design a distributed autocomplete system that aggregates suggestions from multiple regional Trie replicas.
Solve it now
Free. No sign-up. Python and JavaScript run instantly in your browser.
FAQ
Why store sentences at each intermediate node rather than only at terminal nodes?
Storing at each node lets you answer a prefix query in O(1) per character without DFS. The trade-off is O(n*L) space vs. O(subtree size) DFS time per query. For read-heavy workloads with large dictionaries, pre-stored sentences are faster.
How do you handle spaces in the input?
Spaces are valid characters in the sentence — they are treated identically to letters as Trie edge characters. The only special character is '#'.
What is the tiebreak ordering?
Primary sort: descending hot degree (higher frequency first). Secondary sort: ascending ASCII/lexicographic order. In JS, use localeCompare for reliable string comparison.
Practice these live with InterviewChamp.AI
Drill Design Search Autocomplete System and other Hugging Face interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.
Practice these live with InterviewChamp.AI →