Skip to main content

System Design Questions

Design a Real-Time Gaming Matchmaker — System Design Interview Guide

Design a Real-Time Gaming Matchmaker is a system-design interview that asks you to pair skill-rated online players into balanced matches: a player clicks 'find game', and within 10-30 seconds they're in a lobby with 9 other players of similar skill, on a server in their region, with acceptable network latency. The hard part is the queue dynamics — too strict on skill matching and queues stall; too loose and matches are one-sided — plus the rating computation that has to be incremental and tamper-resistant.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Microsoft
  • Amazon
  • Roblox
  • Activision
  • Riot Games

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Player enters matchmaking queue with game mode, region preference, party composition
  • System produces a balanced lobby (e.g., 10 players, 5-vs-5) within a target wait time
  • Compute and update each player's skill rating after each match
  • Allow parties of friends to queue together (the party's combined skill must still produce a balanced match)
  • Support multiple game modes, each with its own queue and rating
  • Allow players to cancel queue or leave a lobby pre-game

Non-functional requirements

  • Scale: 10M concurrent online players at peak, ~1M concurrent in queue, 500K matches/hour
  • Queue latency: median wait time <30 seconds; p95 <90 seconds; p99 <3 minutes
  • Match-quality: at least 80% of matches within 100 rating-point skill spread
  • Network latency: server selected within 80ms RTT of all 10 players for >95% of matches
  • Availability: 99.9% on matchmaking; players tolerate brief queue freezes more than mid-match drops

Capacity estimation

Public assumptions for a popular online competitive game: ~50M MAU, ~10M peak concurrent, ~1M concurrently in matchmaking. Match rate: 1M / 10 players-per-match / 30-second-avg-wait = ~3,300 matches starting per second peak; over an hour, ~500K matches.

Queue state: 1M players × ~200 bytes per queue entry (player_id, rating, region, party_id, joined_at, preferences) = 200 MB. Easily fits in memory across a queue tier.

Match history: 500K matches/hour × 24 hours = 12M matches/day × ~5 KB per match record (player ids, scores, duration, server, replay pointer) = 60 GB/day, ~22 TB/year. The replay binary lives in object storage (separate from the relational match-record store); a typical replay is 10-50 MB.

Rating writes: 500K matches/hour × 10 players = 5M rating updates/hour = ~1,400/sec. Each update is a small row write keyed by player_id, sharded by player_id. Trivial.

Region partitioning: assume 6 regions (NA-East, NA-West, EU-West, EU-Central, Asia-East, OCE). Each region runs its own queue tier locally. Cross-region queuing happens only after wait exceeds a threshold and we accept higher network latency to find a match. Per-region peak: 1M / 6 = ~170K in queue per region.

Queue dynamics: wait time is governed by player density at a rating band. At a popular rating (say, the median, 1500), the band [1450, 1550] sees thousands of players join per minute — matches form in seconds. At the tail (rating 2700+, top 0.1%), the band might see one player join every 30 seconds — wait times balloon to minutes. Production systems handle this by widening the skill window over time: start at ±50 rating, widen by +25 every 15 seconds until a match is found or max-spread is hit.

High-level design

Four core services: queue, matcher, rating, match-coordinator.

Queue service maintains the live in-memory queue per region per game mode. It's not a FIFO — it's a multi-dimensional index keyed by (rating, region, party_size, mode). Each player entry holds metadata: arrival time, skill rating, party_id (if grouped), region, preferences. The queue is sharded by region and replicated within a region for fault tolerance; a queue node failure should not eject players mid-search.

Matcher runs the actual pairing algorithm. It pulls candidate players from the queue, applies the matching rules (skill window, party-composition constraints, region constraint), and produces a lobby. The matcher does NOT scan the entire queue per cycle — it operates on rating bands. Each band has its own search loop; players in adjacent bands are considered only when their wait time exceeds the widening threshold. The matcher emits lobby-formed events to the match-coordinator.

Rating service computes and stores each player's skill rating (commonly an Elo-derived rating or a Bayesian rating that tracks both a skill estimate and an uncertainty value). After each match, it ingests the result, computes the rating update for each of the 10 players, writes back to the player-rating store (sharded by player_id, eventually-consistent). Rating updates are based on (expected outcome | rating_difference) vs. (actual outcome); a player who loses to a much lower-rated opponent loses more rating than one who loses to a much higher-rated opponent.

Match-coordinator handles the lifecycle from lobby-formed to match-ended. It selects a game server (a stateless game-server-fleet manager handles allocation), confirms all 10 players are still online and ready, opens the connection, listens for match-result events, then triggers the rating update. If a player disconnects pre-match, the coordinator reverts the lobby and re-queues the remaining 9 (or fails the whole lobby if the disconnect was the host).

Party handling: a party of friends enters the queue as a single unit. The party's effective rating is a function of member ratings (commonly the average, or a weighted blend that penalizes high-spread parties to prevent 'smurfing' — a high-rated player carrying a low-rated friend). The matcher treats the party as a single queue entry with that effective rating, and ensures the opposing party (or a balanced collection of solo+small-parties on the other team) hits the same effective rating.

Deep dive — the hard problem

Two deep dives: the skill-rating model, and queue dynamics with region partitioning.

Skill rating model — three viable patterns.

Pattern A: Elo. The classic chess rating. Each player has a single number; after each game, the winner takes points from the loser proportional to the rating difference. Simple, well-understood, easy to debug. Tradeoff: doesn't model uncertainty, so a brand-new player and a 1000-game player with the same rating are treated identically. New players churn through provisional ratings that lag their true skill.

Pattern B: Bayesian rating (mean + variance). Each player has a skill mean and an uncertainty variance. New players start with high variance, which decreases as games accumulate. The matcher prefers pairing players with overlapping skill distributions, not just identical means — a player at 1500±50 is a fine match for 1500±50, but a poor match for 1500±300 (the high-variance player could actually be 1200 or 1800). Production-grade and widely used.

Pattern C: per-mode skill plus team skill. The skill rating is multidimensional: rating in (mode_A, mode_B, ...) and a separate rating for team-based vs. solo play. Adds complexity but matches reality — a player who's great at 1-vs-1 may be mediocre in 5-vs-5. Pick this only if the product genuinely has very different modes; otherwise it adds queue fragmentation (each mode's queue is smaller, wait times grow).

Rating-update path is the critical surface: it must be incremental (O(1) per game per player) and tamper-resistant (clients cannot self-report results). Production systems take the match result from the game-server-attested replay log, not from the client. The rating-update is computed deterministically from (player_ratings_before, match_outcome) — a re-runnable function. If a cheating report leads to a retroactive disqualification, the rating service replays the update with the cheater removed; subsequent matches the cheater played get their updates rolled back and reapplied in order. Replayability matters more than absolute correctness on the first pass.

Queue dynamics: the central tension is skill spread vs. wait time. Tight skill matching produces balanced, fun matches but stalls queues; loose matching produces lopsided matches that drive churn. Production tactic: skill-window widening over time. Player joins at rating 1500. First 15 seconds: search [1450, 1550]. Next 15: [1425, 1575]. Next 15: [1400, 1600]. Cap at, say, [1300, 1700]. If still no match at the cap, broaden region constraints (allow cross-region matches with up to +50ms RTT) before broadening skill further.

Region partitioning: each region runs its own queue and matcher. 95%+ of matches form within-region with good network latency. The cross-region escape hatch kicks in for tail players (top 0.1% by rating in low-population regions). Implementation: each region's queue periodically publishes a 'wait-time pressure' metric per rating band; a cross-region matcher monitors these and proactively pairs players from two regions when one has surplus and the other has deficit. The opening latency hit is well-defined (each player gets a server with ~+50ms RTT vs. their preferred region) and accepted as the alternative to a 5-minute wait.

Third tradeoff: party imbalance. A party of (1800, 1800) vs. (1500, 2100) has the same average rating but very different match dynamics — the 2100 will dominate while the 1500 gets carried. Production matchers compute a 'party spread penalty' and refuse to pair high-spread parties against low-spread parties. Mentioning this signals you understand that average-of-ratings is an insufficient summary statistic.

Fourth: queue manipulation and smurfing. Players intentionally lose to drop rating and stomp lower-rated opponents. Detection: track win-rate vs. expected win-rate over a rolling window; a player who loses 20 games then wins 20 with a 90% rate is suspicious. Production response: rating decay on suspected accounts, shadow-queue them with other smurfs. Mention this as an operational concern; interviewers reward the candidate who raises it.

Common mistakes

  • Treating the queue as a FIFO — matchmaking is a multi-dimensional search problem, not a linear one
  • Forgetting party composition — a 5-friend party with a 600-rating spread is unfair against a balanced 5 solo players
  • Skipping the widening-window dynamic — a static skill window stalls queues at the rating tails
  • Trusting the client for match results — rating updates must come from the game server, not the player
  • Designing one global queue without region partitioning — network latency would tank match quality

Likely follow-up questions

  • How would you design a 'ranked' season system where ratings reset every 3 months and rewards are tier-gated?
  • What changes if you have to match across multiple platforms (PC + console + mobile) with different input latencies?
  • How would you support a 'role queue' where each lobby needs specific roles filled (1 tank, 2 healers, 2 damage)?
  • How would you detect and shadow-ban smurfs without breaking legitimate alt-accounts?
  • How would you handle a scenario where 20% of players in a region simultaneously disconnect due to an outage?

Practice Design a Real-Time Gaming Matchmaker live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

Why isn't pure Elo good enough for modern matchmakers?
Elo treats every player as a single deterministic point on a number line, but new players' ratings are highly uncertain — early matches with them are essentially random. A Bayesian model with explicit uncertainty produces dramatically better early-game matches and converges faster to true skill.
Do I need to deep-dive on the rating math?
Mention the model family (Elo or Bayesian-with-uncertainty) and the key inputs (rating-difference, outcome, optionally margin-of-victory). Drilling into the update equation is out of scope unless the interviewer pushes — focus on the system properties: deterministic, replayable, server-attested.
How is matchmaking different from a dating-app match (Design Tinder)?
Dating matches are bilateral and unconstrained on count (one user matches with one other). Game matchmaking is N-way (10 players form a single lobby) and has strict balance constraints (the two teams must be near-equal in summed skill). The queue search is fundamentally different — N-way constraint satisfaction, not pairwise scoring.
What is the most important concept for this question?
Queue dynamics with widening windows. The naive design uses a fixed skill window and stalls; the production design adapts the window to player density at each rating band. Demonstrating this tradeoff signals senior-level understanding.