Skip to main content

System Design Questions

Design Reddit — System Design Interview Guide

Design Reddit is a system-design interview that asks you to build a community-driven aggregator: users post links and text into subreddits, others vote and reply in deeply nested comment threads, and ranking surfaces the best content per subreddit and globally. The hard part is the comment tree and the ranking pipeline.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Reddit
  • Meta
  • LinkedIn
  • Pinterest

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Submit a post (text, link, or image) to a chosen subreddit
  • Upvote or downvote a post or a comment; vote affects ranking
  • Comment on a post; reply to a comment, producing a nested tree
  • View a ranked feed per subreddit (Hot, New, Top, Controversial) and a personalized home feed
  • Subscribe to subreddits; the home feed mixes subscribed subreddits' top posts

Non-functional requirements

  • Feed read latency: <300ms p99 for a subreddit's Hot page
  • Vote eventual consistency: a vote is visible to the voter immediately; global score within ~10s
  • Scale: ~500M MAU, ~100K posts/day, ~3M comments/day, ~100M votes/day, peak read QPS ~1M
  • Availability: 99.99% for reads; degraded ranking is acceptable, full feed outage is not

Capacity estimation

Public 2024 Reddit scale: ~500M monthly active users, ~100K-150K posts/day, ~3M comments/day, ~100M+ votes/day, ~3M subreddits with ~150K active. Read-to-write ratio is high: ~1B feed reads/day = 12K reads/sec average, peak ~1M reads/sec at NA evening. Writes are tiny by comparison — ~100K posts/day = ~1 post/sec average, ~3M comments/day = ~35 comments/sec average.

Storage: a post is ~2 KB metadata + body, a comment ~0.5 KB, a vote ~50 bytes. Annual post bytes: 100K × 365 × 2KB = ~75 GB/year (trivially small). Comments: ~500 GB/year. Votes are the volume: ~36B votes/year × 50B = ~2 TB/year. Hot in-memory caches dominate sizing far more than disk storage — the feed pages, top comments, and recent votes all live in cache.

The shape that matters: reads outweigh writes by ~10,000:1. Designing for write throughput is a beginner mistake; the system is essentially a giant precomputed-feed cache with a small write engine behind it.

High-level design

Three core domains: posts/comments storage, voting + score, and ranked feeds. Each can scale independently.

Posts and comments live in a sharded relational store, sharded by subreddit_id (so all posts and comments for one subreddit colocate, enabling fast subreddit-feed reads). Comments form a tree per post — store with a parent_comment_id pointer and a materialized path (e.g. '/c123/c456/c789') for efficient subtree reads. A separate full-text search index ingests posts and comments via change-data-capture for keyword search.

Voting is a high-volume stream: a vote write goes to a vote-event log (append-only) plus an idempotent update to a per-target score row (post_id or comment_id → up_count, down_count, score). Idempotency comes from a (user_id, target_id) primary key on the vote row — a user voting twice on the same target overwrites; switching from up to down is a single update. The score is denormalized into the post or comment row for cheap reads.

Ranked feeds (Hot, Top, Controversial) are precomputed by a stream processor that consumes vote events and post-create events. For each subreddit, it maintains a sorted set of the top ~1000 posts by each ranking, refreshed on every vote with bounded latency (~5-10s). A subreddit-feed read is a single key lookup in an in-memory store. The personalized home feed is a fanout-on-read: merge the top-N from each of the user's subscribed subreddits at request time (typically <100 subreddits, fast merge). New (chronological) and Top (sorted by score within a window) read from different precomputed sorted sets in the same in-memory tier.

Deep dive — the hard problem

Two deep dives: the comment tree and the Hot ranking algorithm.

Comment tree: a single post can have 10K+ comments forming a tree 20+ levels deep. Naïve approach (fetch all comments, build tree in app) breaks down on viral posts. Better: store each comment with a materialized path ('1.5.12.3' indicating the path from root) and a per-subtree score (sum of votes in subtree). Render in two phases. Phase 1: fetch the top N top-level comments by score and the top M direct replies per top-level. This is a single bounded query. Phase 2: render 'load more' for deeper or lower-scored branches with an explicit click, fetching only that subtree. This bounds the initial payload regardless of total comment count.

The ranking inside the tree is the next layer: within siblings, sort by Reddit's 'confidence sort' (a Wilson lower-bound interval on the up/down counts) so newly-posted comments with one upvote don't dominate older comments with hundreds of upvotes. Storing the confidence-sort score on the comment row keeps the sort cheap. Discussing Wilson interval explicitly is a strong signal.

Hot ranking: Reddit's Hot algorithm is famous and instructive — score = log10(max(|votes|, 1)) + sign(votes) × (post_age_seconds / 45000). The log dampens runaway-popular posts; the time term penalizes age; the sign term lets controversial posts (heavy down-vote) fall faster. The key property: it's computable from (vote_count, post_age) — no per-user signal needed — so all posts in a subreddit can be ranked by Hot from a single sorted-set update per vote. When a vote arrives, recompute the post's Hot score and update its position in the subreddit's Hot sorted set in the in-memory ranking store. The whole sort is O(log N) per vote.

Third tradeoff: vote brigading and fraud. A coordinated upvote campaign can manipulate Hot rank within minutes. Defenses: rate-limit votes per user per subreddit per hour, score votes by account age and karma (sock-puppet accounts have less weight), and apply a 'vote fuzz' (small random noise to displayed counts) to make scraping less useful. Mention rate-limit + account-weighted votes as the standard answer.

Common mistakes

  • Treating the comment tree as a flat list — misses the load-more recursion pattern and the per-subtree scoring
  • Computing Hot rank live on every feed read instead of incrementally updating a sorted set on each vote
  • Forgetting vote idempotency — double votes from network retries inflate scores
  • Sharding posts by post_id instead of by subreddit_id — kills subreddit-feed read locality
  • Skipping fraud and brigading defenses — interviewer asks 'what if I have 10K sock accounts' within the first 10 minutes

Likely follow-up questions

  • How would you implement live vote-count updates so users see the count tick up in real time?
  • What changes if a single subreddit grows to 50M subscribers (r/AskReddit scale)?
  • How would you handle a sudden viral post pulling 1M votes in 10 minutes?
  • How would you build a 'best of all time' feed that ranks across all subreddits, all years?
  • How would you implement comment editing while preserving the original for moderation context?

Practice Design Reddit live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

How long is the Design Reddit interview round?
45-60 minutes. The expectation at L5/E5+ is that you cover comment tree, voting, AND Hot ranking. Skipping any of the three is a clear no-hire signal. Source: Glassdoor Reddit 2022-2024 reports.
Do I need to know the exact Hot formula?
Knowing the shape (log of votes + linear in age, with a sign term for controversy) is enough. Reciting the constants is overkill. The signal is understanding why log dampens popular posts and why the time term lets new posts compete.
Should I cover the home feed (personalized across subreddits) or just subreddit feeds?
Cover both, but spend 80% on the per-subreddit feed because that's where the ranking algorithm lives. The home feed is fanout-on-read across subscribed subreddits — explain once, move on.
Is Design Reddit easier than Design Twitter?
Comparable. Twitter is fanout-on-write to follower timelines; Reddit is sorted-set ranking per subreddit. Reddit's comment tree is a unique surface that Twitter doesn't have. Most interviewers consider Reddit slightly harder because the ranking algorithm + tree pagination both need to be solved cleanly.