Design a Real-Time Multiplayer Game Server — System Design Interview Guide
Design a Real-Time Multiplayer Game Server is a system-design interview that asks you to build the server-side state authority for a fast-paced multiplayer game: 10-100 players in a match, 30-60 server ticks per second, sub-100ms input-to-response, and a guarantee that nobody can cheat by lying about their state. The hard part is reconciling network latency with responsiveness — players need to feel that their inputs register instantly, but the server must remain the single source of truth.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Reported in interviews at
- Microsoft
- Amazon
- Roblox
- Activision
- Epic Games
Sourced from Glassdoor, Levels.fyi, and Blind interview reports.
Functional requirements
- Authoritatively simulate the game state for a match (positions, hit detection, scoring)
- Receive player input commands and apply them to the simulation
- Broadcast state updates to all players in the match at a fixed tick rate
- Detect and reject invalid inputs (movement speed exceeded, impossible actions, anti-cheat triggers)
- Handle player joins, leaves, and reconnections within a match
- Persist match results (kills, scores, duration) at match end
Non-functional requirements
- Scale: 1M concurrent matches, 10-100 players per match (50M concurrent players)
- Tick rate: 30-60 Hz simulation (one full state update every 16-33ms)
- Input-to-response latency: <100ms p95 perceived by the player (including network)
- Server-attested authority: client cannot alter authoritative state; the server is the only source of truth
- Bandwidth per player: <50 KB/sec downstream (delta-encoded state), <5 KB/sec upstream (inputs)
- Match-server availability during a match: 99.99% — a server crash should be recoverable within seconds
Capacity estimation
Match server fleet: 1M concurrent matches. Each match runs on a single game-server process (or VM/container). At average 1 vCPU + 1 GB RAM per match for typical action games, that's 1M vCPUs and 1 PB RAM in aggregate fleet. Realistic per-host density: a single physical host with 64 vCPUs runs ~40 matches (accounting for headroom and spikes). Fleet size: 25K hosts. Geographically distributed across 6-10 regions for latency.
Network bandwidth per match: 30 Hz tick × 10 players × 5 KB state delta = 1.5 MB/sec downstream per match, plus 30 Hz × 10 players × 200 bytes input = 60 KB/sec upstream. Aggregate across 1M matches: 1.5 TB/sec downstream from the entire fleet. Distributed across 25K hosts, each host pushes ~60 MB/sec — manageable on a 10 Gbps NIC.
State delta: at 30 Hz, a full state snapshot of a 10-player match is ~50 KB; delta-encoded against the prior tick, the typical payload drops to 1-5 KB. The delta encoding is the difference between 50 KB/sec/player (impossible at scale) and 5 KB/sec/player (workable).
Match duration: typical action game match is 5-30 minutes. Match-start churn = 1M / avg-15-min = ~67K new matches/sec. Match-server orchestration must allocate, warm, and tear down servers at this rate.
State persistence: only the match result (winner, scores, kills, duration, ~5 KB) is persisted to long-term storage. The full simulation state lives entirely in memory and is discarded at match end (or written to a replay file in object storage for spectator/review use, ~50-100 MB compressed for a 20-minute match).
High-level design
Server-authoritative architecture. The match server holds the canonical world state. Clients send only inputs (move forward, fire, jump) — never positions. The server simulates the physics and game logic at a fixed tick rate and broadcasts the resulting state to all clients.
Four layers: match orchestrator, match server, client networking, anti-cheat.
Match orchestrator allocates a match server when a lobby is formed. It picks a host with capacity in the right region, reserves a slot, and returns the connection details (host, port, session token) to the lobby's players. After the match ends, the orchestrator returns the slot to the pool. The orchestrator runs a pool-management algorithm — pre-warm idle servers for fast match starts, scale the fleet up before peak hours, scale down after.
Match server runs the simulation loop. Every tick (16-33ms): apply queued inputs from the last interval, advance the simulation (physics, AI, game logic), compute the state delta vs. the prior tick, broadcast deltas to all connected clients. The simulation is deterministic given the same input sequence — this lets you record inputs to a replay file and re-simulate later.
Client networking sends inputs at the same tick rate (or faster) and receives state updates. To hide network latency, the client runs client-side prediction: when the player presses 'move forward', the client immediately moves the player locally without waiting for server confirmation. When the server's authoritative state arrives, the client reconciles — if the local prediction matches the server, no visible change; if not, the client snaps to the server state (or interpolates smoothly).
Lag compensation: for actions like 'fire weapon', the server has to reconcile the fact that the firer sees the target at the position it was 50-100ms ago. Production tactic: the server rewinds the target's position to where it was at the firer's timestamp (using a circular buffer of recent positions), checks the hit at that historical position, and applies damage if it lands. This makes the experience feel responsive to the firer while preserving server authority on the outcome.
Anti-cheat is multi-layered: server-side validation (every input is bounds-checked — no teleporting, no firing more shots than the weapon's fire-rate allows), statistical anomaly detection (player's accuracy is 99% across 10 matches — flag for review), and a separate client-side anti-cheat agent that scans the game process for known cheat-injection patterns. The architecture choice is to never trust the client for any state — only for inputs, which the server then validates.
Deep dive — the hard problem
Three deep dives: server-authoritative state with client prediction, state delta encoding, and the tick-rate vs. latency tradeoff.
Server-authoritative tick rate. The simulation loop runs at a fixed rate (commonly 30 or 60 Hz; competitive shooters push to 128 Hz). Higher tick rate = more responsive feel but more CPU per match and more bandwidth. Each tick the server: (1) drains the input queue from each connected client (inputs are timestamped by client), (2) applies inputs in tick order (with anti-cheat validation per input), (3) advances physics by one tick, (4) computes the new world state, (5) compares to prior tick and produces deltas, (6) sends the delta to each client. Total tick budget at 60 Hz: 16ms. Production servers commonly use 10-12ms for simulation and leave 4-6ms for networking and overhead.
Client-side prediction is the central mechanic that makes the game feel responsive. The client runs a local copy of the simulation logic (only the parts that affect the local player). When the player presses 'jump', the client jumps the player locally immediately. The server's authoritative state arrives ~100ms later confirming the jump; the client checks if its prediction matched. If yes (the common case), no visible change. If no (the server saw the jump differently — maybe a different obstacle was in the way), the client reconciles. Reconciliation is the hardest part: you can snap (jarring), interpolate (laggy), or roll back and re-simulate (the gold-standard approach for competitive games). Roll-back means the client remembers the last 5-10 ticks of inputs; when a correction arrives, it rolls back to the corrected tick and re-simulates from there with the inputs it has already sent.
State delta encoding. Sending the full world state every tick is bandwidth-impossible — a 10-player match at 30 Hz with a 50 KB state would be 1.5 MB/sec per match. The standard pattern: compute the delta against the prior acknowledged tick, encode only the fields that changed, send. Each entity (player, projectile, pickup) has versioned fields; the encoder skips unchanged fields. Typical compression: 50 KB full state → 1-5 KB delta. Two complications.
Complication 1: packet loss. If a delta against tick 100 is lost and tick 101 sends a delta against 100, the client can't apply 101 because it never received 100. Production handles this with one of two strategies: (a) reliable ordered delivery on a custom UDP protocol with retransmits; (b) send deltas against the latest acknowledged tick per client — the server tracks per-client ACKs and encodes each client's payload against the tick they last confirmed.
Complication 2: state divergence across clients. Each client sees a slightly different version of the game world (because each receives slightly different deltas at slightly different latencies). The server's authoritative state reconciles this — every client periodically receives a full snapshot to correct any accumulated drift.
Tick rate vs. latency tradeoff. At 30 Hz, the server's tick is 33ms — input-to-response includes the worst-case 33ms tick boundary delay plus 50-100ms one-way network latency. Total: 100-150ms perceived input lag. At 128 Hz, the tick boundary drops to <8ms — total perceived lag drops below 100ms, which players notice and value in competitive shooters. The cost is 4x more CPU per match and ~2x more bandwidth (smaller deltas but more of them).
Lag compensation deep dive. For a shooter, the firer presses 'fire' when their crosshair is on the enemy. The firer's input arrives at the server 50ms later. By then, the enemy has moved. Naive server: check hit at the enemy's current position — miss, and the firer is furious. Lag-compensated server: rewind the enemy to where they were at the firer's timestamp (firer's input includes their own client time), check hit at that historical position, apply damage if it lands. This requires storing a position buffer (last 500ms of positions for every entity) and per-tick rewind logic. Tradeoff: the victim's experience is 'I died when I was already behind cover from my view' — but the alternative (no lag compensation) breaks competitive play.
Fourth surface: reconnection. A player loses network for 5 seconds mid-match. The server must continue the simulation (so other players don't pause), then resync the returning player. Pattern: server keeps the disconnected player's slot reserved for 30 seconds; on reconnect, server pushes a full state snapshot (not just deltas), then resumes normal delta updates.
Common mistakes
- Allowing the client to send positions (instead of just inputs) — instant cheating vector
- Not implementing client-side prediction — every input feels like it's underwater (100ms+ visible lag)
- Sending full state every tick instead of deltas — bandwidth costs 30-50x more than necessary
- Skipping lag compensation in a shooter — feels broken to competitive players
- Treating the match server as stateless — it holds 100% of the in-flight state and must be physically dedicated
Likely follow-up questions
- How would you support spectators who join mid-match and want to watch live?
- What changes if you have a battle royale (100 players in one large world) instead of a 5v5 match?
- How would you implement a 'rewind' feature where a player can review the last 30 seconds of their gameplay?
- How would you handle a match where one player has a 300ms ping due to a bad ISP?
- How would you migrate a match in-progress from one server to another (e.g., during a planned maintenance)?
Practice Design a Real-Time Multiplayer Game Server live with an AI interviewer
Free, no sign-up required. Get real-time feedback on your design.
Practice these liveFrequently asked questions
- Why must the server be authoritative? Can't the client just send positions?
- If the client sends positions, a cheating client sends fake positions. The entire competitive integrity of the game collapses. Server-authoritative state with client-only-inputs is the foundational anti-cheat property of every modern competitive game.
- Is 60 Hz tick rate enough?
- For casual games, yes. For competitive shooters, the gold standard pushes to 128 Hz to minimize perceived input lag below 100ms. The tradeoff is CPU per match — 128 Hz costs roughly 2x the compute of 60 Hz.
- Do I need to discuss the netcode protocol (UDP vs. TCP)?
- One sentence: 'UDP with custom reliability and ordering layer per channel — TCP's head-of-line blocking would tank latency under packet loss.' The deep dive is the prediction-and-reconciliation logic, not the wire protocol.
- What is the senior signal for this question?
- Three: (1) you articulate the server-authoritative-with-client-prediction split clearly; (2) you discuss state delta encoding and per-client ACK tracking; (3) you mention lag compensation and its experiential tradeoff (the victim feels they died behind cover). Missing any of these is a hire-reduce signal.