Skip to main content

System Design Questions

Design Amazon Prime Video — System Design Interview Guide

Design Amazon Prime Video is a system-design interview that asks you to build a streaming platform with both video-on-demand and live event streaming: hundreds of millions of subscribers watch movies and TV on demand, while live sports (Thursday Night Football) reach 15M+ simultaneous viewers. The hard part is the live-streaming CDN architecture at peak event scale.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Amazon
  • Netflix
  • Disney
  • Google (YouTube)
  • Meta

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Browse the catalog (movies, shows, sports) with personalized recommendations
  • Stream video-on-demand at multiple resolutions (480p, 720p, 1080p, 4K) with adaptive bitrate
  • Stream live events (sports, premieres) to large simultaneous audiences
  • Resume playback across devices (continue where you left off)
  • Download content for offline viewing on mobile
  • Multi-profile per account; parental controls

Non-functional requirements

  • Scale: ~200M Prime subscribers, ~150M monthly streamers, peak live concurrency ~15M for top NFL game
  • Video startup time: <2 sec p95 from play tap to first frame visible
  • Streaming reliability: <0.1% rebuffer ratio across the session
  • Live latency: <30 sec from real-world event to viewer screen (industry-standard) or <5 sec for low-latency live
  • Availability: 99.99%; outages during a live event are unacceptable

Capacity estimation

Public scale anchors: Prime Video has ~200M Prime subscribers globally. Peak live concurrency: Thursday Night Football peaks at ~15M simultaneous viewers (publicly reported 2023-2024 Prime Video TNF averages). Major event premieres can hit similar peaks.

Bandwidth math at peak: 15M concurrent live viewers × 5 Mbps (1080p average bitrate) = 75 Tbps egress at peak. This is more bandwidth than the entire global Internet exchange traffic on a typical day. The CDN must distribute this load across thousands of edge nodes globally.

VOD bandwidth at peak (prime time global): ~50M concurrent VOD viewers × 5 Mbps = 250 Tbps. VOD is more distributed in time than live (viewers don't all watch the same scene simultaneously) so cache hit rates are higher.

Storage: VOD catalog is ~50K titles × multiple resolutions × multiple language tracks. A 2-hour movie at 4K + 1080p + 720p + 480p with 10 language audio tracks is ~30-50 GB. Catalog total: ~50K × 40 GB average = 2 PB. With audit copies, backup, and multi-region replication: ~10 PB.

Live event ingest: a 3-hour football game produces ~30 GB of source mezzanine video that's transcoded into the streaming ladder (multiple bitrates) and chunked for delivery. Ingest is small; egress is the dominant cost.

Metadata: catalog metadata, user playback state, watch history. Watch history is the largest by row count — billions of (user_id, title_id, position) records. Catalog metadata is small.

High-level design

Five major layers: catalog and discovery, video processing pipeline, content delivery network, playback state, and live ingest.

Catalog and discovery: a sharded relational store holds title metadata (title, cast, description, episode list, available regions). A recommendation service ranks titles per user using watch history and content features; results are cached per user and refreshed nightly. Personalized homepage rows are precomputed offline and served from an in-memory cache.

Video processing pipeline: when content is ingested, it runs through transcoding to produce a 'bitrate ladder' — the same content at 480p, 720p, 1080p, 4K, with multiple bitrates per resolution. Each variant is chunked into ~2-6 second segments. The standard chunking format is HLS or DASH; segments are typically a few MB each. The processing pipeline writes all chunks to object storage and emits a manifest file listing every chunk.

Content delivery network (CDN): chunks are served from a global edge network. For VOD, popular titles are pre-pushed to edge caches in every region; long-tail titles are pulled on demand from origin to the edge. The CDN's cache hit rate is the dominant cost lever — 95%+ hit rate is the operational target.

Playback state: a user's current playback position, last-watched title, watch history, and resume-points live in a sharded key-value store. Each play, pause, and skip emits a state-update event; positions are persisted every few seconds. The store is fronted by an in-memory tier because state-read happens at every player startup and at every resume.

Live ingest: the source video feed (from a broadcast truck, satellite uplink, or remote contribution feed) enters a regional ingest pod that transcodes in real time to the bitrate ladder. Chunks flow into the CDN edge with a small target delay (typically 10-30 seconds for legacy HLS, <5 sec for low-latency variants). The live path is much more time-sensitive than the VOD path — origin processing must keep pace with the live stream in real time, with no margin to fall behind.

Deep dive — the hard problem

Two deep dives: the live-streaming CDN architecture at peak event scale, and adaptive bitrate streaming.

Live CDN at peak: a 15M-concurrent live event requires fundamentally different distribution than VOD. In VOD, ten viewers watching the same movie at different times hit the cache independently — the cache serves all of them from a single fetch. In a live event, ten million viewers are watching the same chunk at the same time. Without careful architecture this would melt origin servers.

The standard answer is hierarchical CDN with strict request-coalescing. Edge servers serve viewers within their region; if a chunk is not in the edge cache, the edge requests it from a regional shield layer; if not in the shield, the shield requests from origin. Request-coalescing at each layer ensures that no matter how many viewers ask for a chunk concurrently, only one upstream request is made per chunk per cache. With three layers (edge → shield → origin) the origin sees roughly one request per chunk regardless of viewer concurrency.

Geographic distribution: edge servers are placed in hundreds of POPs globally; for a US-focused live event, traffic is concentrated in US POPs with shield layers in major metros. For global events the architecture spans continents with regional shields.

Chunk freshness and live-edge: in live streaming the player must always be near the 'live edge' (the most recent chunk produced by the encoder). Players use a small buffer (typically 3-5 chunks ahead) to absorb network jitter. The encoder writes chunks to origin; origin propagates to shield; shield propagates to edge; edge serves to players. The end-to-end pipeline latency determines how far behind the real world the live edge actually is. Production tunes this to ~10-15 seconds for legacy HLS, ~3-5 seconds for low-latency HLS or DASH.

Adaptive bitrate (ABR): the player monitors network throughput and switches between bitrates in the ladder dynamically. If bandwidth drops, the player requests the next chunk at a lower resolution; if bandwidth recovers, it upshifts. The ABR algorithm runs entirely on the client. Server-side architecture only ensures all bitrates are available for every chunk. This is why every chunk is encoded in 4-8 bitrate variants — the player must be able to switch instantly without waiting for a re-encode.

Server-side ABR signals matter at scale: edge nodes report regional bandwidth conditions back to ABR-aware recommendation algorithms that bias initial bitrate choice toward what's likely to work — a viewer in a region with degraded ISP performance might start at 720p instead of 1080p to avoid an early rebuffer.

Third tradeoff: VOD catalog cache strategy. Production CDNs use multi-tier object storage for VOD: hot tier (chunks pushed to edge for popular titles), warm tier (regional caches for moderately-popular titles), cold tier (origin only, fetched on demand). The cache push schedule runs nightly based on the next-day's predicted viewing — premieres are pre-positioned everywhere, deep catalog is pulled on first request.

Fourth tradeoff: DRM and content protection. Streaming services apply DRM (Widevine, FairPlay, PlayReady) per platform. Decryption keys are issued per session by a license server; chunks are encrypted at rest. Mention DRM exists but don't drill in unless asked — the interview signal is in the CDN and ABR architecture.

Common mistakes

  • Treating live streaming the same as VOD — live's simultaneous concurrency pattern is fundamentally different and breaks naive cache architectures
  • Forgetting request-coalescing — without it, the first chunk of a viral live event triggers 15M concurrent origin requests
  • Designing a single global CDN tier instead of hierarchical edge → shield → origin — origin saturates immediately
  • Skipping the bitrate ladder and ABR discussion — modern video streaming is built around adaptive selection per chunk
  • Storing video chunks in the same store as metadata — the byte/row mismatch makes both stores wrong

Likely follow-up questions

  • How would your design handle a 50M-concurrent live event (Super Bowl scale)?
  • What changes if you have to support sub-1-second latency for live sports (interactive live commentary)?
  • How would you support DVR-style time-shifting on a live stream (pause and rewind a live event)?
  • How would you handle a regional ISP outage that affects 5% of your viewers mid-event?
  • How would you implement personalized ad insertion in a live stream so each viewer sees different ads at break points?

Practice Design Amazon Prime Video live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

How long is the Design Prime Video interview?
60 minutes typical. The interview expects coverage of both VOD and live streaming, with at least one deep-dive into the CDN architecture or ABR.
How does Design Prime Video differ from Design Netflix?
Netflix is VOD-dominant; Prime Video has both VOD and a major live-streaming surface (Thursday Night Football, premieres). The live event architecture is the key Prime Video-specific topic — hierarchical CDN with request-coalescing for simultaneous concurrency.
Do I need to know specific streaming protocols (HLS, DASH, LL-HLS)?
Naming HLS or DASH as the segmenting/manifest protocol is enough. Saying 'we'd use Low-Latency HLS for sub-5-second live' is bonus signal. Drawing the manifest format is overkill.
What is the single most important concept for Design Prime Video?
Hierarchical CDN with request-coalescing for live concurrency. The 15M-viewer-simultaneous-chunk-fetch is the architecture's hardest test; getting it right is the senior+ signal.