Why is a relational database a bad fit for the rank store?

Ranking by score against 100M rows requires either a sorted index scan (read every row) or a precomputed rank column that has to be recomputed on every write. Neither hits sub-100ms p99. In-memory sorted sets give you O(log N) for both directions of the query.

Do I need to discuss anti-cheat in detail?

One paragraph: validate game-server-signed payloads, rate-limit per player, run anomaly detection on score jumps. Drilling into the cheat-detection ML model is out of scope for this question; the architecture point is that the rank-store can never trust the client directly.

Is friends-only leaderboard a separate scaling problem?

Yes — it's the hard part of this question for some interviewers. Materializing one board per user is too expensive; computing on read is too slow for popular accounts. The standard answer is lazy materialization plus aggressive caching.

What is the senior signal for Design a Leaderboard?

Two: (1) you propose a sorted-set data structure with subtree-size tracking for O(log N) rank queries, not a plain BST; (2) you treat the score-history log as the source of truth and the rank-store as a derived, recomputable view.

System Design Questions

Design a Real-Time Leaderboard — System Design Interview Guide

Design a Real-Time Leaderboard is a system-design interview that asks you to build a ranked scoreboard for a game or social product: millions of players post scores, and any player can ask 'what's my rank?' or 'show me the top 100' with near-real-time freshness. The hard part is the rank-by-score query at scale — naive sorting is impossibly expensive — plus write-amplification on hot players and storage cost for the long tail of inactive accounts.

By Sam K., Founder, InterviewChamp.AI · Last verified 2026-05-25

Reported in interviews at

Meta
Microsoft
Amazon
Roblox
Activision

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

Submit a score for a player (absolute score or delta)
Get the top N players for a board (global, regional, friends-only)
Get the rank and score for a specific player
Get a window of players around a given player (rank-50 above, rank-50 below)
Support multiple leaderboard scopes: all-time, weekly, daily, per-region, per-game-mode

Non-functional requirements

Scale: 100M monthly active players, ~1B score submissions/day, ~5K writes/sec average, ~50K writes/sec peak (tournament hour)
Read latency: <100ms p99 for top-N and per-player rank queries
Write latency: <500ms p99 from score submission to rank being visible
Freshness: ranks visible within 1-2 seconds; not strictly real-time for non-top-1000
Availability: 99.95% — leaderboard reads must survive write-path degradation
Storage: per-player metadata ~200 bytes; per-score-history entry ~50 bytes (optional retention)

Capacity estimation

Public assumptions: 100M MAU, ~10M DAU during normal play, ~50M DAU during a major event. Each active player posts ~10 scores/day on average, peaking at ~100 scores/day during a tournament. Write QPS: 50M × 10 / 86,400 = ~5,800/sec average. Peak during a tournament cutoff hour: 50M × 100 / 3,600 = ~1.4M/sec — far above the steady-state 5K/sec figure. Real interview answer: design for 50K/sec sustained with bursting to 500K/sec for short windows.

Read QPS: top-N is the dominant read pattern (every player checks the leaderboard ~5x/day). 50M DAU × 5 reads / 86,400 = ~2,900/sec average for top-N alone. Per-player rank queries from the game client run once per match-end: 50M × 10 matches/day / 86,400 = ~5,800/sec.

Storage: 100M players × 200 bytes metadata = 20 GB — trivially fits one node. Per-score history (optional): 1B scores/day × 50 bytes × 90 days retention = 4.5 TB. Most leaderboards only need current-score-per-player-per-board, not full history: 100M players × 50 boards (all-time, weekly, daily, regional × game-mode) × 50 bytes = 250 GB.

Write amplification on hot players: a top-100 streamer might post 1,000 score updates in an hour during a livestream. If every score triggers a rank-board write, the sorted set sees 1,000 updates on a single hot key. Production systems debounce — accept the score in the write path but defer the rank-board update by 100-500ms, coalescing back-to-back updates on the same key.

Long tail: 80% of all-time leaderboard entries belong to players who haven't logged in for 30+ days. These rarely-queried entries dominate storage cost. Tiered storage moves stale boards (week N-12 weekly, for instance) to cheaper cold storage after a defined freshness window expires.

High-level design

Three core services: score-ingest, rank-store, query.

Score-ingest accepts a submission (player_id, board_id, score, timestamp), validates it (anti-cheat), and writes to two destinations: a durable score-history log in an append-only store (for audit, replay, retroactive cheating reversals) and the in-memory rank-store. Ingest is sharded by board_id so a single hot board doesn't take down the whole tier.

Rank-store holds the actual sorted set per leaderboard scope. The canonical data structure is a skip-list or balanced tree keyed by score, with a hash-map sidecar mapping player_id to score for O(1) per-player lookups. This dual structure lets you serve top-N in O(N) and per-player rank in O(log N) — exactly what the access pattern demands. Each board is a single in-memory data structure; shard across nodes by board_id. For very-high-fanout boards (the global all-time board), partition by score range and merge at read time, or replicate the entire board across read-only followers.

Query service handles the read API. Top-N is a single sorted-set range query. Per-player rank is hash-map lookup for the score, then sorted-set rank-by-score query. Windowed query (rank-50 around player X) is two range queries: 50 above and 50 below the player's score. All three operations are sub-millisecond on a well-sized in-memory tier.

Freshness vs. consistency: scores are written to history first, then asynchronously applied to the rank-store. The lag is sub-second under normal load. Players who need to see their own score reflected immediately get a write-through path: the ingest service returns only after the rank-store has been updated for that player. Other players see the new rank within ~1 second through standard async propagation.

Anti-cheat lives at ingest: validate that the score is consistent with game-server-attested play (signed payload from the game server, not the client), rate-limit per player, and run anomaly detection (a player's score jumped 10x in an hour — flag for review). Mention this as an operational concern; interviewers reward candidates who raise it.

Deep dive — the hard problem

Two deep dives: the sorted-set data structure tradeoffs, and tiered storage for stale leaderboards.

Sorted-set choice — three real options.

Option A: skip-list. Probabilistic O(log N) insert, delete, rank-by-score, score-by-rank. Used by most in-memory data stores for sorted sets. Stable performance, simple implementation, easy to replicate. Best default for a leaderboard at <100M entries per board.

Option B: balanced BST (red-black or AVL). Deterministic O(log N) for the same operations. Slightly more complex code, marginal memory overhead. Pick this if you need worst-case guarantees over average-case.

Option C: B-tree on disk. Necessary if a single board exceeds available RAM — but 100M entries × 50 bytes = 5 GB fits comfortably in memory on a single node. The B-tree option appears only at extreme scales (100B+ entries) or when you need persistent durability without a separate write-ahead log. For interview scope, mention it and move on.

A key implementation detail: the rank-by-score query (give me the rank of score X) requires the data structure to track subtree sizes (or skip-list span counts). Plain BSTs don't — you'd need an order-statistic tree. Mentioning this signals you understand the difference between 'sorted container' and 'sorted container with rank queries' — a senior-bar distinction.

Tiered storage for stale leaderboards: weekly boards from 12 weeks ago, regional boards for inactive regions, daily boards from yesterday — these are read once a year but cost the same as hot boards in RAM. Production tactic: after a defined freshness window (24 hours for daily, 14 days for weekly, 90 days for monthly), serialize the entire board to compressed object storage and evict from RAM. On a query against a stale board, fetch from object storage, hold in a short-TTL cache (15 minutes), serve. Cold reads are slower (~500ms) but acceptable for a query against an old board nobody actually cares about.

Write-amplification on hot scores: a tournament cutoff hour produces 500K writes/sec where 10% of those land on the top-1000 board. If every write triggers a sorted-set update + a rank-recompute, the top-1000 board's hot keys become a bottleneck. Two mitigations.

Mitigation 1: debounce. Accept the score immediately into the history log, but defer the rank-store update by 100-500ms per player. Within that window, coalesce multiple updates for the same player to a single rank-store write. The player still sees their final score reflected within a second.

Mitigation 2: write-combining at the rank-store. The rank-store maintains a small batching buffer per shard; every 50ms it flushes accumulated updates as a single batch. This trades a tiny freshness penalty for a 10-50x reduction in per-update overhead.

Third tradeoff: friends-only leaderboards. A naive design materializes a friends-board per user (lookup user's friends → sort by score), which costs O(F log F) per read where F is friend count. Top friends (high follower counts) become hot. Solution: precompute friends-boards lazily — first read computes and caches; subsequent reads serve from cache with a 30-second TTL. New score submissions invalidate only the boards of players who have the submitter as a friend.

Fourth: retroactive cheating reversals. A player is found to have cheated 3 weeks ago. Action: remove their scores from the affected boards, recompute ranks for everyone who was below them. Cost: O(K) where K is the number of boards affected, plus a single rank-restore pass on each. The score-history log makes this possible — it's the source of truth; the rank-store is a derived view.

Common mistakes

Storing scores in a row-oriented database and running ORDER BY at query time — the sort is impossibly expensive at 100M rows even with an index
Using a plain BST or hash-map without subtree-size tracking — you can't answer rank-by-score in O(log N)
Designing one giant sorted-set across all players globally without partitioning by board scope — kills cache locality
Forgetting write-amplification — a tournament hour produces 100x normal write QPS that the rank-store has to absorb
Treating the rank-store as the source of truth — when scores are wrong, you have no history to replay from

Likely follow-up questions

How would you design a leaderboard for a game with 1B players where a single board exceeds available RAM?
How would you support 'percentile rank' queries (am I in the top 1%?) at sub-100ms latency?
What changes if scores can decay over time (e.g., chess Elo decay after inactivity)?
How would you handle a leaderboard for a competitive sport where ties must be broken by tiebreaker fields (most recent submission, most kills, etc.)?
How would you let a player undo their last score submission within a 30-second grace window?

Related system design scenarios

Frequently asked questions

Why is a relational database a bad fit for the rank store?: Ranking by score against 100M rows requires either a sorted index scan (read every row) or a precomputed rank column that has to be recomputed on every write. Neither hits sub-100ms p99. In-memory sorted sets give you O(log N) for both directions of the query.
Do I need to discuss anti-cheat in detail?: One paragraph: validate game-server-signed payloads, rate-limit per player, run anomaly detection on score jumps. Drilling into the cheat-detection ML model is out of scope for this question; the architecture point is that the rank-store can never trust the client directly.
Is friends-only leaderboard a separate scaling problem?: Yes — it's the hard part of this question for some interviewers. Materializing one board per user is too expensive; computing on read is too slow for popular accounts. The standard answer is lazy materialization plus aggressive caching.
What is the senior signal for Design a Leaderboard?: Two: (1) you propose a sorted-set data structure with subtree-size tracking for O(log N) rank queries, not a plain BST; (2) you treat the score-history log as the source of truth and the rank-store as a derived, recomputable view.