Design Quora — System Design Interview Guide
Design Quora is a system-design interview that asks you to build a question-and-answer platform: users post questions, others write long-form answers, the community votes, and an algorithmic feed surfaces interesting questions. The hard part is ranking high-quality answers under sparse and noisy signals.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Reported in interviews at
- Quora
- Meta
- Stack Overflow
Sourced from Glassdoor, Levels.fyi, and Blind interview reports.
Functional requirements
- Post a question (title plus optional body and topic tags)
- Write a long-form answer to any open question; support rich formatting and images
- Upvote or downvote questions and answers
- Follow topics, users, or specific questions for feed updates
- Browse a personalized feed mixing followed-topic questions with algorithmic suggestions
- Search questions and answers by keyword or topic
Non-functional requirements
- Scale: ~400M MAU, ~50M questions, ~250M answers, ~10B upvotes lifetime
- Read-heavy: ~1000:1 reads to writes (answer views >> answer writes)
- Answer-view latency: <300ms p99 to load a question page with top answers
- Search latency: <500ms p99 for keyword queries
- Availability: 99.95%; eventual consistency on vote counts is acceptable
Capacity estimation
Public scale anchors: ~400M MAU, ~50M questions accumulated, ~250M answers. Daily new questions ~50K/day, daily new answers ~200K/day = ~2-3 writes/sec average. Daily upvotes ~10M = ~120/sec average, ~600/sec peak. Daily answer views ~500M = ~6K/sec average, ~50K/sec peak.
Storage: question rows ~500 bytes (title, body, topics, owner, timestamps, vote counts). Answer rows average ~3 KB (long-form content + metadata). 250M answers × 3 KB = ~750 GB raw, ~3 TB with replication and search indexes. Votes are denormalized — each vote is an edge (user_id, content_id, +1 or -1) in a sharded store; 10B lifetime votes × 32 bytes = ~320 GB.
Search index: full-text over question titles, bodies, and answer text. Index size ~3x the raw text = ~10 TB. The index is sharded by topic prefix for query-time pruning. Click and engagement telemetry: ~5B events/day during peaks (views, upvotes, dwell-time pings) into a streaming log, retained for the ranking pipeline.
High-level design
Four services: content, votes, feed, and search. Plus a topic-graph service that maps questions to topics and topics to followers.
Content service stores questions and answers in a sharded relational store. Questions are sharded by question_id; each row holds the question fields plus a vote_count denormalized counter. Answers are also sharded by question_id (co-located with the parent question so loading a question page is one-shard) but store answer_id, content, author_id, vote_count, and ranking_score columns. Long answer bodies are stored inline up to a size limit; very long answers can spill to object storage with a pointer.
Votes service is a separate sharded store keyed by (content_id, user_id) so 'has this user voted on this content' is O(1). Vote events trigger asynchronous updates to the denormalized vote_count on the content row. We don't keep the vote_count strictly accurate; it's an eventually-consistent rollup refreshed every few seconds for hot content.
Feed service builds personalized timelines. Three sources contribute candidates: questions from topics the user follows (newest-first), questions from users the user follows, and algorithmic recommendations (high-engagement questions in topics adjacent to the user's interests). A ranker re-orders the combined candidate list using viewer-specific features. Feed is materialized in an in-memory per-user cache and refreshed every few minutes.
Search service maintains an inverted index over titles, bodies, and answers. Queries return ranked document IDs; a re-ranker uses topic relevance, answer quality scores, and personalization to produce the final ordering.
Topic graph maps questions to topics (often multiple — a question can sit in 'Machine Learning', 'Python', 'Statistics') and tracks topic followers. This is the join surface for feed and recommendation paths.
Deep dive — the hard problem
Two deep dives: answer-ranking, and personalized feed under sparse signals.
Answer ranking is the hardest problem. Each popular question accumulates dozens to hundreds of answers; only the top 3-5 are shown above the fold, and that ordering decides which answers get engagement, which compounds — top-ranked answers get more views, more upvotes, and stay on top. A naive 'highest upvote count' ranking has a feedback loop: the first decent answer locks in the top slot regardless of later, better answers.
Production systems use a score combining multiple signals: total upvotes, upvote velocity (recent rate), author reputation, answer length and structure, time decay, and editorial signals (a 'high quality' classifier scoring answer content for substance, citations, formatting). The score is computed offline per answer every few minutes and stored as a ranking_score column. When a question page loads, the answer list is sorted by ranking_score, not raw vote count.
A Wilson confidence interval on the upvote rate (used by Reddit's 'best' sort) handles the early-answer problem better than raw counts — it asks 'what is the lower bound of the true upvote rate given the observed count' so a new answer with 5/5 upvotes can outrank an older answer with 100/120 upvotes. Mention this pattern explicitly.
Personalized feed under sparse signals: a typical user explicitly follows 5-20 topics but only writes 1-2 answers per year. The signal for what they actually want to read is far broader than what they declared interest in. The recommendation engine compensates by inferring implicit topics from view-history, dwell-time, and upvote patterns — a user who upvotes physics questions on a profile labeled 'Marketing' is implicitly interested in physics.
The feed is built in three phases. Phase 1 (candidate generation): pull questions from explicitly-followed topics (high precision, low recall), from inferred-interest topics (moderate precision, higher recall), and from a trending pool (questions with rapidly-rising engagement across all topics). Phase 2 (scoring): a ranker model scores each candidate per viewer. Phase 3 (diversification): post-process to ensure no single topic dominates and no single author appears more than once. Mention all three phases; the diversification step is often forgotten and is what separates a good feed from a single-topic spam.
Third tradeoff: handling duplicate questions. Quora is notorious for the same question being asked in slightly different wording. A duplicate-detection pipeline runs on question creation: compute a question embedding, query the vector index for near-duplicates, and either merge into the existing question (preserving answer continuity) or flag for moderation. Mention this — it's a common follow-up.
Common mistakes
- Ranking answers purely by upvote count — the first-answer feedback loop locks in a suboptimal ordering
- Treating votes as synchronous writes to the content row — every upvote becomes a hot-row contention problem at scale
- Skipping topic graph and modeling the feed as a single global list — kills personalization and increases low-quality results
- Ignoring duplicate-question detection — interviewers consistently ask about it
- Designing search without a re-ranker — generic text-relevance returns poorly-written answers above well-written ones
Likely follow-up questions
- How would your design handle a 100K-upvote answer that goes viral overnight?
- What changes if you have to support real-time collaborative answer editing (multiple users on the same answer)?
- How would you implement a 'request an answer' notification flow that pings experts in a topic without spamming them?
- How would you detect and demote answers that are AI-generated low-quality content?
- How would you support multilingual Q&A where the same question can be answered in different languages?
Practice Design Quora live with an AI interviewer
Free, no sign-up required. Get real-time feedback on your design.
Practice these liveFrequently asked questions
- How long is the Design Quora system-design round?
- 45-60 minutes typically. The interview emphasizes ranking and feed personalization; spending too much time on the basic CRUD operations is a known anti-pattern.
- Is Design Quora the same as Design Stack Overflow?
- Surface-similar but different emphasis. Stack Overflow centers on per-question canonical-answer ranking and reputation tied to specific tags; Quora centers on multi-topic feed personalization across a broader content surface. The answer-ranking problem is shared; the feed and topic-graph problems are Quora-specific.
- Do I need to know the Wilson score formula?
- Naming it and explaining the intuition (lower-bound on true rate given observed count) is enough. Deriving the formula is overkill.
- What is the single most important concept to know for Design Quora?
- Answer ranking under engagement feedback loops. Almost every senior signal hinges on whether the candidate recognizes that pure-count ranking is broken and proposes a multi-signal score with time decay.