642. Design Search Autocomplete System

Q: Why store frequencies at every Trie node rather than only at terminal nodes?

Querying at a prefix requires knowing all sentences that start with that prefix and their frequencies. Storing the full frequency map at each intermediate node makes prefix-query O(1) at the cost of O(sentences) extra memory per node.

Q: How do you handle spaces in the prefix?

Treat space as just another character in the Trie — include it as a key in the children map. The problem explicitly states c can be a space character, so your Trie must handle it without special-casing.

hardAsked at Glean

This is arguably the most Glean-relevant problem that exists — it asks you to build the autocomplete system that is Glean's core product differentiator, combining a Trie with a frequency-ranked top-K retrieval engine.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-22

Source citations

Public interview reports confirming this problem appears in Glean loops.

Glassdoor (2026-Q1)— Glean interviewers have asked this problem verbatim in multiple onsite reports — it is considered the highest-signal design-coding hybrid in their question bank.
Blind (2025-12)— Ranked #1 most Glean-specific problem in community prep threads. Candidates with strong Trie + ranking knowledge consistently report receiving offers.

Problem

Design a search autocomplete system for a search engine. Users may input a sentence (at least one word and end with a special character '#'). Given a string array sentences and an integer array times representing historical queries and their respective frequencies, implement the AutocompleteSystem class: AutocompleteSystem(sentences, times) initializes the object with historical data. List<String> input(char c) given a character c, if c == '#', stores the current input as a new historical sentence and returns an empty list. Otherwise, returns the top 3 historical sentences with the same prefix as the current input, sorted by frequency (descending). Ties are broken lexicographically.

Constraints

n == sentences.length == times.length
1 <= n <= 100
1 <= sentences[i].length <= 100
1 <= times[i] <= 50
c is a lowercase English letter, '#', or a space character.
Each call to input will have the characters typed so far form a prefix of a non-empty sentence.
At most 5000 calls to input.

Examples

Example 1

Input

AutocompleteSystem(['i love you','island','iroman','i love leetcode'],[5,3,2,2]); input('i'); input(' '); input('a'); input('#')

Output

[["i love you","island","i love leetcode"],["i love you","i love leetcode"],[],[]]

Explanation: After 'i', return top 3 sentences starting with 'i' by frequency. After 'i ', return sentences starting with 'i '. After 'i a', no match. '#' saves 'i a' with frequency 1.

Approaches

1. Trie with frequency map at each node

Build a Trie where each node stores a map of sentence → frequency for all sentences passing through it. On each input character, traverse the Trie and return the top 3 sentences by frequency from the current node's map. On '#', insert the accumulated prefix as a new sentence.

Time: O(p * s * log 3) per input where p = prefix length, s = matching sentences
Space: O(total characters across all sentences)

class TrieNode {
  constructor() {
    this.children = new Map();
    this.sentences = new Map(); // sentence -> frequency
  }
}
class AutocompleteSystem {
  constructor(sentences, times) {
    this.root = new TrieNode();
    this.curr = this.root;
    this.prefix = '';
    for (let i = 0; i < sentences.length; i++) {
      this._insert(sentences[i], times[i]);
    }
  }
  _insert(sentence, freq) {
    let node = this.root;
    for (const ch of sentence) {
      if (!node.children.has(ch)) node.children.set(ch, new TrieNode());
      node = node.children.get(ch);
      node.sentences.set(sentence, (node.sentences.get(sentence) || 0) + freq);
    }
  }
  input(c) {
    if (c === '#') {
      this._insert(this.prefix, 1);
      this.prefix = '';
      this.curr = this.root;
      return [];
    }
    this.prefix += c;
    if (this.curr !== null && this.curr.children.has(c)) {
      this.curr = this.curr.children.get(c);
    } else {
      this.curr = null; // prefix not in trie
    }
    if (this.curr === null) return [];
    // get top 3 by frequency desc, then lex asc
    return [...this.curr.sentences.entries()]
      .sort((a, b) => b[1] - a[1] || a[0].localeCompare(b[0]))
      .slice(0, 3)
      .map(([s]) => s);
  }
}

Tradeoff: Each Trie node stores a frequency map for all matching sentences. Lookup is fast (O(1) Trie traversal), but sorting all matching sentences per input character is O(s log s). For the given constraints (n ≤ 100) this is efficient enough.

2. Trie + pre-sorted top-K at each node

Instead of storing all frequencies and sorting on query, maintain a pre-sorted top-3 list at each TrieNode. On insert, update and re-sort affected nodes. On query, just return the stored top-3 directly.

Time: O(p * n) per insert (update all nodes), O(1) per query after Trie traversal
Space: O(total characters * 3)

// Sketch — full code builds on approach 1
// Each TrieNode stores: top3 = [] (sorted list of up to 3 [freq, sentence] pairs)
// insert: traverse and update each node's top3 if the new sentence changes rankings
// input: return this.curr.top3.map(([_, s]) => s)
// This is the production-optimal variant — mention it as the follow-up optimization
console.log('See approach 1 for the full implementation; this is the optimization direction.');

Tradeoff: O(1) query after Trie walk — optimal for high-query-rate systems. More complex insert logic. Describe this as the production optimization after presenting approach 1.

Glean-specific tips

Open with: 'This is literally Glean's search bar.' Interviewers often smile and say 'yes.' Then lay out the component breakdown before any code: (1) Trie for prefix navigation, (2) frequency map at each node for ranking, (3) top-3 extraction with tie-breaking. Glean specifically values the tie-breaking rule (lexicographic ascending for same frequency) — handle it in your sort comparator and mention it proactively. Be prepared to discuss the trade-off between storing all sentences per node vs. pre-computing top-K at each node.

Common mistakes

Forgetting to reset this.curr to root on '#' — subsequent inputs would continue from the wrong node.
Not handling the case where this.curr is null (prefix not found in Trie) — return [] instead of crashing.
Incorrect tie-breaking — when frequencies are equal, sort lexicographically ascending (a < b); do not sort by insertion order.
Inserting the completed sentence with the wrong frequency on '#' — always add 1 to the existing frequency; don't replace it.

Follow-up questions

An interviewer at Glean may pivot to one of these next:

What if you need to support deletion (remove a sentence from autocomplete history)?
How would you scale this to millions of historical queries across multiple users?
Design a fuzzy autocomplete that returns results even if the prefix has a typo (edit distance 1).

Solve it now

Free. No sign-up. Python and JavaScript run instantly in your browser.