Skip to main content

642. Design Search Autocomplete System

hardAsked at Hugging Face

Design a search autocomplete system that ranks completions by historical frequency. Hugging Face asks this because it directly mirrors their Hub model search infrastructure — prefix matching on model names with ranking by download counts requires exactly the Trie + frequency heap composition tested here.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Source citations

Public interview reports confirming this problem appears in Hugging Face loops.

  • Glassdoor (2026-Q1)Multiple Hugging Face SWE onsite reports cite Design Search Autocomplete as a high-signal hard system design + coding hybrid.
  • Blind (2025-12)Hugging Face threads identify this as a flagship hard problem for infrastructure and platform engineering roles, given its direct relevance to the Hub search experience.

Problem

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). For each character they type except '#', you need to return the top 3 historical hot sentences that have the same prefix as the part of the sentence already typed. The returned top 3 hot sentences should be sorted by hot degree (times the sentence has been typed before). If several sentences have the same hot degree, use ASCII-order as the tiebreaker. If fewer than 3 hot sentences exist, return as many as you have. When a user types '#', the system stores the sentence already typed plus add the hot degree of this sentence by 1 and return an empty list.

Constraints

  • n == sentences.length
  • n == times.length
  • 1 <= n <= 100
  • 1 <= sentences[i].length <= 100
  • 1 <= times[i] <= 50
  • c is a lowercase English letter, a blank space, or '#'.
  • Each sentence ends with '#' and will not appear in sentences.
  • The sentence already typed has a length in the range [1, 200].
  • The number of calls to input will be in the range [1, 5000].

Examples

Example 1

Input
AutocompleteSystem(["i love you","island","iroman","i love leetcode"],[5,3,2,2]); input('i'); input(' '); input('a'); input('#');
Output
[["i love you","island","i love leetcode"],["i love you","i love leetcode"],[],[]]

Explanation: After 'i': top 3 sentences starting with 'i'. After 'i ': top 3 starting with 'i '. After 'i a': none match. After '#': stores 'i a' with frequency 1.

Approaches

1. Trie with frequency map at each node

Build a Trie where each node stores a frequency map of all sentences that pass through it. For each input character, traverse the Trie and return the top 3 from the current node's map.

Time
O(n*L) to build, O(L + k log k) per input where k = matching sentences
Space
O(n*L)
class TrieNode {
  constructor() {
    this.children = new Map();
    this.sentences = new Map(); // sentence -> count
  }
}
class AutocompleteSystem {
  constructor(sentences, times) {
    this.root = new TrieNode();
    this.current = this.root;
    this.prefix = '';
    for (let i = 0; i < sentences.length; i++) {
      this._insert(sentences[i], times[i]);
    }
  }
  _insert(sentence, count) {
    let node = this.root;
    for (const ch of sentence) {
      if (!node.children.has(ch)) node.children.set(ch, new TrieNode());
      node = node.children.get(ch);
      node.sentences.set(sentence, (node.sentences.get(sentence) || 0) + count);
    }
  }
  input(c) {
    if (c === '#') {
      this._insert(this.prefix, 1);
      this.prefix = '';
      this.current = this.root;
      return [];
    }
    this.prefix += c;
    if (this.current !== null && this.current.children.has(c)) {
      this.current = this.current.children.get(c);
    } else {
      this.current = null;
    }
    if (this.current === null) return [];
    const candidates = Array.from(this.current.sentences.entries());
    candidates.sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]));
    return candidates.slice(0, 3).map(e => e[0]);
  }
}

Tradeoff: O(n*L) space storing sentence sets at every node — space-heavy but gives O(1) traversal per character and O(k log k) sort per query where k is the number of matching sentences. Acceptable for the given constraints.

2. Trie with terminal nodes + DFS prefix search

Build a Trie where only terminal nodes store (sentence, count) pairs. Each input character traverses to the next Trie node; on each non-# input, DFS from the current node to collect all completions, then sort and return top 3.

Time
O(n*L) build, O(p + s log s) per input where p = suffix subtree size, s = matching sentences
Space
O(n*L)
class TrieNode {
  constructor() {
    this.children = new Map();
    this.sentence = null; // only set for terminal nodes
    this.count = 0;
  }
}
class AutocompleteSystem {
  constructor(sentences, times) {
    this.root = new TrieNode();
    this.current = this.root;
    this.prefix = '';
    for (let i = 0; i < sentences.length; i++) {
      this._insert(sentences[i], times[i]);
    }
  }
  _insert(sentence, count) {
    let node = this.root;
    for (const ch of sentence) {
      if (!node.children.has(ch)) node.children.set(ch, new TrieNode());
      node = node.children.get(ch);
    }
    node.sentence = sentence;
    node.count += count;
  }
  _dfs(node, results) {
    if (node.sentence) results.push([node.sentence, node.count]);
    for (const child of node.children.values()) this._dfs(child, results);
  }
  input(c) {
    if (c === '#') {
      this._insert(this.prefix, 1);
      this.prefix = '';
      this.current = this.root;
      return [];
    }
    this.prefix += c;
    if (this.current && this.current.children.has(c)) {
      this.current = this.current.children.get(c);
    } else {
      this.current = null;
    }
    if (!this.current) return [];
    const results = [];
    this._dfs(this.current, results);
    results.sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]));
    return results.slice(0, 3).map(r => r[0]);
  }
}

Tradeoff: More memory-efficient (no sentence sets at every node) but DFS per query is slower for large Tries. The trade-off is space vs. per-query time. Discuss this explicitly with the interviewer.

Hugging Face-specific tips

Hugging Face is the company most likely to care about the practical design implications of this problem. Lead with: 'In production, Hugging Face Hub search stores model names with download counts — the exact (prefix, frequency) structure this problem models.' Discuss which node stores what data (sentence sets vs. terminal-only) and the trade-off explicitly. Ask the interviewer: 'Should I optimize for memory or for query latency?' This shows you think like a product infrastructure engineer, not just a puzzle-solver. Also handle the '#' input correctly — it both commits the current prefix to the Trie AND resets state.

Common mistakes

  • Forgetting to reset current pointer and prefix string when '#' is received.
  • Not adding the new sentence to the Trie on '#' input — without this, user queries never become suggestions.
  • Using < instead of <= for the hot degree tiebreak — ASCII order should only apply when counts are equal.
  • Not handling the case where the current Trie node becomes null mid-prefix — all subsequent non-# inputs should return [] until the next '#' reset.

Follow-up questions

An interviewer at Hugging Face may pivot to one of these next:

  • How would you persist and load this Trie from disk for a production search system?
  • How would you rank completions by recency in addition to frequency?
  • Design a distributed autocomplete system that aggregates suggestions from multiple regional Trie replicas.

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.

Output

Press Run or Cmd+Enter to execute

FAQ

Why store sentences at each intermediate node rather than only at terminal nodes?

Storing at each node lets you answer a prefix query in O(1) per character without DFS. The trade-off is O(n*L) space vs. O(subtree size) DFS time per query. For read-heavy workloads with large dictionaries, pre-stored sentences are faster.

How do you handle spaces in the input?

Spaces are valid characters in the sentence — they are treated identically to letters as Trie edge characters. The only special character is '#'.

What is the tiebreak ordering?

Primary sort: descending hot degree (higher frequency first). Secondary sort: ascending ASCII/lexicographic order. In JS, use localeCompare for reliable string comparison.

Practice these live with InterviewChamp.AI

Drill Design Search Autocomplete System and other Hugging Face interview questions under real-loop conditions with instant feedback on your reasoning, complexity claims, and code.

Practice these live with InterviewChamp.AI →