Design a Real-Time Gaming Matchmaker — System Design Interview Guide
Design a Real-Time Gaming Matchmaker is a system-design interview that asks you to pair skill-rated online players into balanced matches: a player clicks 'find game', and within 10-30 seconds they're in a lobby with 9 other players of similar skill, on a server in their region, with acceptable network latency. The hard part is the queue dynamics — too strict on skill matching and queues stall; too loose and matches are one-sided — plus the rating computation that has to be incremental and tamper-resistant.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Reported in interviews at
- Microsoft
- Amazon
- Roblox
- Activision
- Riot Games
Sourced from Glassdoor, Levels.fyi, and Blind interview reports.
Functional requirements
- Player enters matchmaking queue with game mode, region preference, party composition
- System produces a balanced lobby (e.g., 10 players, 5-vs-5) within a target wait time
- Compute and update each player's skill rating after each match
- Allow parties of friends to queue together (the party's combined skill must still produce a balanced match)
- Support multiple game modes, each with its own queue and rating
- Allow players to cancel queue or leave a lobby pre-game
Non-functional requirements
- Scale: 10M concurrent online players at peak, ~1M concurrent in queue, 500K matches/hour
- Queue latency: median wait time <30 seconds; p95 <90 seconds; p99 <3 minutes
- Match-quality: at least 80% of matches within 100 rating-point skill spread
- Network latency: server selected within 80ms RTT of all 10 players for >95% of matches
- Availability: 99.9% on matchmaking; players tolerate brief queue freezes more than mid-match drops
Capacity estimation
Public assumptions for a popular online competitive game: ~50M MAU, ~10M peak concurrent, ~1M concurrently in matchmaking. Match rate: 1M / 10 players-per-match / 30-second-avg-wait = ~3,300 matches starting per second peak; over an hour, ~500K matches.
Queue state: 1M players × ~200 bytes per queue entry (player_id, rating, region, party_id, joined_at, preferences) = 200 MB. Easily fits in memory across a queue tier.
Match history: 500K matches/hour × 24 hours = 12M matches/day × ~5 KB per match record (player ids, scores, duration, server, replay pointer) = 60 GB/day, ~22 TB/year. The replay binary lives in object storage (separate from the relational match-record store); a typical replay is 10-50 MB.
Rating writes: 500K matches/hour × 10 players = 5M rating updates/hour = ~1,400/sec. Each update is a small row write keyed by player_id, sharded by player_id. Trivial.
Region partitioning: assume 6 regions (NA-East, NA-West, EU-West, EU-Central, Asia-East, OCE). Each region runs its own queue tier locally. Cross-region queuing happens only after wait exceeds a threshold and we accept higher network latency to find a match. Per-region peak: 1M / 6 = ~170K in queue per region.
Queue dynamics: wait time is governed by player density at a rating band. At a popular rating (say, the median, 1500), the band [1450, 1550] sees thousands of players join per minute — matches form in seconds. At the tail (rating 2700+, top 0.1%), the band might see one player join every 30 seconds — wait times balloon to minutes. Production systems handle this by widening the skill window over time: start at ±50 rating, widen by +25 every 15 seconds until a match is found or max-spread is hit.
High-level design
Four core services: queue, matcher, rating, match-coordinator.
Queue service maintains the live in-memory queue per region per game mode. It's not a FIFO — it's a multi-dimensional index keyed by (rating, region, party_size, mode). Each player entry holds metadata: arrival time, skill rating, party_id (if grouped), region, preferences. The queue is sharded by region and replicated within a region for fault tolerance; a queue node failure should not eject players mid-search.
Matcher runs the actual pairing algorithm. It pulls candidate players from the queue, applies the matching rules (skill window, party-composition constraints, region constraint), and produces a lobby. The matcher does NOT scan the entire queue per cycle — it operates on rating bands. Each band has its own search loop; players in adjacent bands are considered only when their wait time exceeds the widening threshold. The matcher emits lobby-formed events to the match-coordinator.
Rating service computes and stores each player's skill rating (commonly an Elo-derived rating or a Bayesian rating that tracks both a skill estimate and an uncertainty value). After each match, it ingests the result, computes the rating update for each of the 10 players, writes back to the player-rating store (sharded by player_id, eventually-consistent). Rating updates are based on (expected outcome | rating_difference) vs. (actual outcome); a player who loses to a much lower-rated opponent loses more rating than one who loses to a much higher-rated opponent.
Match-coordinator handles the lifecycle from lobby-formed to match-ended. It selects a game server (a stateless game-server-fleet manager handles allocation), confirms all 10 players are still online and ready, opens the connection, listens for match-result events, then triggers the rating update. If a player disconnects pre-match, the coordinator reverts the lobby and re-queues the remaining 9 (or fails the whole lobby if the disconnect was the host).
Party handling: a party of friends enters the queue as a single unit. The party's effective rating is a function of member ratings (commonly the average, or a weighted blend that penalizes high-spread parties to prevent 'smurfing' — a high-rated player carrying a low-rated friend). The matcher treats the party as a single queue entry with that effective rating, and ensures the opposing party (or a balanced collection of solo+small-parties on the other team) hits the same effective rating.
Deep dive — the hard problem
Two deep dives: the skill-rating model, and queue dynamics with region partitioning.
Skill rating model — three viable patterns.
Pattern A: Elo. The classic chess rating. Each player has a single number; after each game, the winner takes points from the loser proportional to the rating difference. Simple, well-understood, easy to debug. Tradeoff: doesn't model uncertainty, so a brand-new player and a 1000-game player with the same rating are treated identically. New players churn through provisional ratings that lag their true skill.
Pattern B: Bayesian rating (mean + variance). Each player has a skill mean and an uncertainty variance. New players start with high variance, which decreases as games accumulate. The matcher prefers pairing players with overlapping skill distributions, not just identical means — a player at 1500±50 is a fine match for 1500±50, but a poor match for 1500±300 (the high-variance player could actually be 1200 or 1800). Production-grade and widely used.
Pattern C: per-mode skill plus team skill. The skill rating is multidimensional: rating in (mode_A, mode_B, ...) and a separate rating for team-based vs. solo play. Adds complexity but matches reality — a player who's great at 1-vs-1 may be mediocre in 5-vs-5. Pick this only if the product genuinely has very different modes; otherwise it adds queue fragmentation (each mode's queue is smaller, wait times grow).
Rating-update path is the critical surface: it must be incremental (O(1) per game per player) and tamper-resistant (clients cannot self-report results). Production systems take the match result from the game-server-attested replay log, not from the client. The rating-update is computed deterministically from (player_ratings_before, match_outcome) — a re-runnable function. If a cheating report leads to a retroactive disqualification, the rating service replays the update with the cheater removed; subsequent matches the cheater played get their updates rolled back and reapplied in order. Replayability matters more than absolute correctness on the first pass.
Queue dynamics: the central tension is skill spread vs. wait time. Tight skill matching produces balanced, fun matches but stalls queues; loose matching produces lopsided matches that drive churn. Production tactic: skill-window widening over time. Player joins at rating 1500. First 15 seconds: search [1450, 1550]. Next 15: [1425, 1575]. Next 15: [1400, 1600]. Cap at, say, [1300, 1700]. If still no match at the cap, broaden region constraints (allow cross-region matches with up to +50ms RTT) before broadening skill further.
Region partitioning: each region runs its own queue and matcher. 95%+ of matches form within-region with good network latency. The cross-region escape hatch kicks in for tail players (top 0.1% by rating in low-population regions). Implementation: each region's queue periodically publishes a 'wait-time pressure' metric per rating band; a cross-region matcher monitors these and proactively pairs players from two regions when one has surplus and the other has deficit. The opening latency hit is well-defined (each player gets a server with ~+50ms RTT vs. their preferred region) and accepted as the alternative to a 5-minute wait.
Third tradeoff: party imbalance. A party of (1800, 1800) vs. (1500, 2100) has the same average rating but very different match dynamics — the 2100 will dominate while the 1500 gets carried. Production matchers compute a 'party spread penalty' and refuse to pair high-spread parties against low-spread parties. Mentioning this signals you understand that average-of-ratings is an insufficient summary statistic.
Fourth: queue manipulation and smurfing. Players intentionally lose to drop rating and stomp lower-rated opponents. Detection: track win-rate vs. expected win-rate over a rolling window; a player who loses 20 games then wins 20 with a 90% rate is suspicious. Production response: rating decay on suspected accounts, shadow-queue them with other smurfs. Mention this as an operational concern; interviewers reward the candidate who raises it.
Common mistakes
- Treating the queue as a FIFO — matchmaking is a multi-dimensional search problem, not a linear one
- Forgetting party composition — a 5-friend party with a 600-rating spread is unfair against a balanced 5 solo players
- Skipping the widening-window dynamic — a static skill window stalls queues at the rating tails
- Trusting the client for match results — rating updates must come from the game server, not the player
- Designing one global queue without region partitioning — network latency would tank match quality
Likely follow-up questions
- How would you design a 'ranked' season system where ratings reset every 3 months and rewards are tier-gated?
- What changes if you have to match across multiple platforms (PC + console + mobile) with different input latencies?
- How would you support a 'role queue' where each lobby needs specific roles filled (1 tank, 2 healers, 2 damage)?
- How would you detect and shadow-ban smurfs without breaking legitimate alt-accounts?
- How would you handle a scenario where 20% of players in a region simultaneously disconnect due to an outage?
Practice Design a Real-Time Gaming Matchmaker live with an AI interviewer
Free, no sign-up required. Get real-time feedback on your design.
Practice these liveFrequently asked questions
- Why isn't pure Elo good enough for modern matchmakers?
- Elo treats every player as a single deterministic point on a number line, but new players' ratings are highly uncertain — early matches with them are essentially random. A Bayesian model with explicit uncertainty produces dramatically better early-game matches and converges faster to true skill.
- Do I need to deep-dive on the rating math?
- Mention the model family (Elo or Bayesian-with-uncertainty) and the key inputs (rating-difference, outcome, optionally margin-of-victory). Drilling into the update equation is out of scope unless the interviewer pushes — focus on the system properties: deterministic, replayable, server-attested.
- How is matchmaking different from a dating-app match (Design Tinder)?
- Dating matches are bilateral and unconstrained on count (one user matches with one other). Game matchmaking is N-way (10 players form a single lobby) and has strict balance constraints (the two teams must be near-equal in summed skill). The queue search is fundamentally different — N-way constraint satisfaction, not pairwise scoring.
- What is the most important concept for this question?
- Queue dynamics with widening windows. The naive design uses a fixed skill window and stalls; the production design adapts the window to player density at each rating band. Demonstrating this tradeoff signals senior-level understanding.