Is Design Search Autocomplete a common interview question?

Yes — it's a frequent senior+ system design at Google and Amazon, and the canonical 'show me you can design a data-structure-heavy system' question. Less common at Meta and snap, more common at search-product companies.

Do I need to know the trie variant by name (radix, Patricia, DAWG)?

Naming 'compressed trie' or 'radix tree' is enough. Drawing the exact compression algorithm is bonus signal; mostly interviewers want you to know why a plain trie wastes memory and to mention an alternative.

How long is the Design Autocomplete round?

45–60 minutes. Senior rounds extend it with personalization, multilingual support, and content safety. Source: Glassdoor Google L5/L6 reports 2023–2024.

Should I cover spell-correction in Design Autocomplete?

If time permits in the last 10 minutes. Spell-correction adds an edit-distance lookup layer (e.g., a BK-tree) that runs in parallel with the trie lookup; merge results. Not required for hire, but signals depth.

System Design Questions

Design Search Autocomplete — System Design Interview Guide

Design Search Autocomplete is a system-design interview that asks you to build the suggestion engine behind a search box: as the user types each character, return the top 10 most likely query completions in under 100ms. The hard part is the prefix data structure and updating it from a stream of new queries.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-19

Reported in interviews at

Google
Amazon
Meta
Microsoft
LinkedIn

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

For each typed prefix, return the top N (e.g., 10) most likely completions
Rank completions by popularity (frequency of past queries with that prefix)
Update the suggestion data continuously from new user queries
Personalize suggestions when user context (history, location) is available
Filter out inappropriate, illegal, or recently-spiked spam queries

Non-functional requirements

Latency: <100ms p99 from keypress to suggestions visible (network + compute combined)
Read-heavy: ~10K typed-keystroke queries/sec/region peak
Freshness: trending queries should appear in suggestions within an hour of starting to trend
Scale: ~5B searches/day globally for a major engine, ~10B keystrokes/day generating suggestion requests

Capacity estimation

Public scale for a major search engine (2023): ~5B queries/day. Each query generates ~10 autocomplete requests as the user types (avg query length 20 chars, with debouncing requests fire roughly every 2 chars). Total autocomplete request volume: ~50B/day = ~580K requests/sec average, peak ~2M/sec globally. Per-region peak is much lower — sharding by region gives each region ~50K-200K requests/sec to handle.

Unique query corpus: ~100M distinct queries account for >95% of all search traffic (Pareto distribution); the long tail of ~1B+ rare queries collectively accounts for <5%. The data structure only needs to serve the top 1-10M queries per region efficiently.

Storage for the prefix trie: ~10M entries × average 30 bytes (query string, frequency, rank metadata) = ~300 MB compressed. Easily fits in RAM per node. With replication and per-region copies (US, EU, APAC, etc.), total RAM across the fleet is ~5-10 GB. Query log for offline aggregation: ~5B queries × 200 bytes = ~1 TB/day, written to a streaming log for analytics.

High-level design

Two pipelines: serving (read path) and aggregation (write path). They are intentionally decoupled.

Serving path: client sends typed prefix to the autocomplete service. The service holds an in-memory prefix trie (or compressed equivalent — see deep dive) where each terminal node has a precomputed sorted list of top-N completions. Lookup: traverse the trie for the prefix, return the top-N at the prefix node. This is O(prefix length) for traversal and O(1) to read the precomputed top-N — both bounded and fast. The serving node is replicated per region for redundancy and latency; clients hit the nearest replica through a region-aware load balancer.

Aggregation path: every search query (or even every typed keystroke) emits an event to a streaming log. A batch aggregation pipeline runs on a rolling window — typically hourly — and computes (prefix, query, count) tuples for the top 1M queries. The aggregation produces a new version of the trie data, which is then loaded by the serving nodes in a rolling deploy.

The trie is rebuilt from scratch every hour (or every few minutes for trending-sensitive engines) rather than mutated in place. This keeps the serving nodes' data structure simple and immutable per version — readers don't need locks. The new version is published to an in-memory data plane that serving nodes pull and atomically swap into place.

Deep dive — the hard problem

Two deep dives: the data structure and the trending-update problem.

Data structure: a classical trie is the textbook answer but is memory-hungry — naïvely, each node holds 26+ pointers. Real systems use either a compressed trie (radix tree / Patricia trie — collapse single-child chains into a single node holding a substring) or a DAWG (directed acyclic word graph — share common suffixes across queries). For the top-1M query corpus, a compressed trie typically uses ~5× less memory than a basic trie. Each terminal node holds the top-10 completions precomputed; the precompute step picks the 10 highest-frequency descendants of that prefix node. Storing precomputed top-N at every node (not just leaves) is what makes lookup O(prefix length) — without precomputed lists, you'd need to traverse all descendants at query time.

Updates and freshness: rebuilding the trie hourly is fine for stable popularity but misses trending events (breaking news where a query spikes from zero to a million in 10 minutes). Two solutions. (1) Decoupled hot-trending overlay: a tiny secondary structure tracks queries that spiked in the last 10 minutes; serving nodes consult both and merge results. (2) Real-time aggregation: a streaming aggregation pipeline computes top-K with sketch data structures (Count-Min Sketch + heavy-hitters) on the live query stream; serving nodes pull updates every few minutes. (1) is simpler; (2) is what large engines actually run.

Third tradeoff: personalization vs caching. Pure personalized suggestions can't share cache entries across users — every user's results differ. Production systems split into a 'global suggestions' tier (cached, shared) and a 'personal re-ranking' tier (cheap, post-process the global top-50 with user features to pick a personalized top-10). Discuss this tradeoff: full personalization at autocomplete latency is expensive; the global + rerank pattern is the standard answer.

Finally, content safety. Autocomplete shouldn't suggest hate speech, illegal content, or product-defaming spam. Solution: maintain a blocklist applied at trie-build time (filter out blacklisted queries before they reach the trie) plus an ML safety classifier scoring marginal suggestions. The blocklist is the floor; the classifier handles the long tail. Mention both layers.

Common mistakes

Building a basic trie without acknowledging memory cost at 10M+ entries — the compressed trie is the production answer
Computing top-N at query time by traversing all descendants — kills latency on common prefixes ('a', 'th')
Designing the aggregation pipeline as synchronous with serving — couples freshness to serving stability
Forgetting trending — interviewers will ask 'what about breaking news queries'
Skipping content-safety filtering — autocomplete is a known liability surface, and missing it is a hire-reduce signal

Likely follow-up questions

How would you support autocomplete in 50 languages including non-Latin scripts (Chinese, Arabic)?
How would you implement typo tolerance ('teh' → 'the')?
What changes if you have to support 1B users typing simultaneously?
How would you A/B test a new ranking model in autocomplete without harming the user experience?
How would you support entity completion ('barack' → 'barack obama' with rich snippet) vs raw text?

Related system design scenarios

Frequently asked questions

Is Design Search Autocomplete a common interview question?: Yes — it's a frequent senior+ system design at Google and Amazon, and the canonical 'show me you can design a data-structure-heavy system' question. Less common at Meta and snap, more common at search-product companies.
Do I need to know the trie variant by name (radix, Patricia, DAWG)?: Naming 'compressed trie' or 'radix tree' is enough. Drawing the exact compression algorithm is bonus signal; mostly interviewers want you to know why a plain trie wastes memory and to mention an alternative.
How long is the Design Autocomplete round?: 45–60 minutes. Senior rounds extend it with personalization, multilingual support, and content safety. Source: Glassdoor Google L5/L6 reports 2023–2024.
Should I cover spell-correction in Design Autocomplete?: If time permits in the last 10 minutes. Spell-correction adds an edit-distance lookup layer (e.g., a BK-tree) that runs in parallel with the trie lookup; merge results. Not required for hire, but signals depth.