Skip to main content

System Design Questions

Design Netflix — System Design Interview Guide

Design Netflix is a system-design interview that asks you to build a global video-on-demand service: 250M+ subscribers stream 4K video, the catalog is searchable and personalized, and playback must start in under 2 seconds anywhere on Earth. The hard part is content delivery at petabyte scale.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Netflix
  • Amazon
  • Google
  • Meta
  • Disney

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Browse a catalog of movies and shows with metadata (title, cast, genre, thumbnails)
  • Search the catalog by keyword, genre, or actor
  • Stream a video at adaptive bitrates (240p–4K) with sub-2-second startup
  • Resume playback at the exact position on any device
  • Personalized homepage rows ('Because you watched X')

Non-functional requirements

  • Startup latency: <2 seconds from tap-to-play (p95)
  • Availability: 99.99%; degraded quality acceptable, full outage is not
  • Scale: 250M subscribers, ~150M concurrent peak (evening prime), ~250 Tbps egress at peak
  • Cost-efficiency: the streaming bytes are the dominant infrastructure cost — egress must be minimized

Capacity estimation

Public 2024 scale: ~280M subscribers, ~3 hours/subscriber/day average viewing = ~800M viewing hours/day. At a blended ~5 Mbps average bitrate (mix of 1080p and 4K), egress is 800M hours × 3600 sec × 5 Mb / 8 = ~1.8 EB/day of egress bandwidth. Peak concurrent streams are ~150M; at 5 Mbps blended, peak egress is ~750 Tbps — significantly more than the public internet's transit capacity, which is why a custom CDN is mandatory.

Catalog metadata is tiny by comparison: ~20K titles × ~10 KB metadata = 200 MB, plus search index of a few GB. Subscriber + playback-position data: 280M subscribers × ~1 KB profile + ~100 bytes per active title = ~300 GB. Video master files dominate storage: ~20K titles × ~50 GB average (multiple bitrates and codecs) = ~1 PB; with ~30 encoded variants per title across resolutions, codecs (H.264, H.265, AV1) and audio tracks, the encoded catalog reaches several PB. Encoded files are pre-positioned to edge caches; the origin holds the master.

High-level design

Two clearly separated planes: control plane (catalog, search, recommendations, account, billing — request/response APIs) and data plane (video bytes — pre-positioned to edge caches near the viewer). The control plane is a classical stateless-app + sharded-store + cache stack: clients hit an API gateway, request browsing or metadata, which is served from in-memory caches in front of sharded relational and document stores. Personalization queries hit a separate recommendation service that returns precomputed row lists keyed by user_id.

The data plane is the interesting part. Each title is pre-encoded offline into ~30 variants: multiple resolutions (240p, 360p, 480p, 720p, 1080p, 4K), multiple codecs (H.264 for legacy devices, H.265 and AV1 for modern ones with better compression), multiple audio tracks. Each variant is segmented into ~10-second chunks; manifests (DASH or HLS) describe the available chunks. Encoded chunks are then pre-positioned (pushed during off-peak hours) into a custom CDN: thousands of cache appliances colocated inside ISP networks. When a viewer presses play, the client fetches the manifest from the nearest cache, then begins fetching chunks; the player monitors network throughput and CPU and adapts the chunk bitrate request up or down — this is Adaptive Bitrate Streaming (ABR), the foundation of smooth playback.

Playback-position tracking is a tiny piece of telemetry: the client posts (user_id, title_id, position_sec, device_id) every ~30 seconds to a write-heavy in-memory store fronting a durable log. Resume-on-other-device reads the latest position keyed by (user_id, title_id) — typically lag-tolerant by a few seconds.

Deep dive — the hard problem

The deep dive is the custom CDN. Why not just use a generic CDN? Math: at 750 Tbps peak global egress, traditional CDNs would charge orders of magnitude more than building your own and would lack the ISP-network presence needed for sub-2-second startup. The custom-CDN approach: deploy compact cache appliances (custom hardware running a simple HTTP cache) directly inside ISPs (Comcast, Vodafone, etc.) — the ISP gets free traffic offload (subscribers' streaming traffic stays inside their network), and the service gets last-mile proximity to viewers. Content is pre-positioned to these appliances overnight based on regional popularity predictions; only ~20% of titles need to live on every appliance (long-tail content falls back to a regional origin).

ABR is the second deep-dive area. The player makes per-chunk decisions: monitor recent throughput, predict next chunk download time, choose the highest bitrate variant whose download will complete before the playback buffer drains. Algorithms range from rate-based (pick the bitrate the throughput estimate supports) to buffer-based (pick the bitrate that keeps buffer above a target) to learned (model-based decisions). Talking through this tradeoff — and noting that low-buffer states must downshift aggressively to avoid stalls, while high-buffer states can take risks to upshift to higher quality — signals you understand the user-perception side of the problem.

Third tradeoff: pre-positioning prediction accuracy. Predict wrong (push a niche title to all appliances, the popular title falls back to origin) and origin egress spikes 10×. The solution: per-region popularity models updated daily, weighted by trending signals, with capacity reservations for new releases. Mention this to signal you understand cost as a first-class design constraint at this scale.

Common mistakes

  • Treating video streaming as a generic 'serve large files' problem — the dedicated CDN architecture is non-negotiable at this scale
  • Forgetting adaptive bitrate — assuming clients pick one bitrate at start and stick with it
  • Designing pre-encoding inline (encode when uploaded) instead of as a separate batch pipeline producing many variants
  • Conflating control plane and data plane into one architecture — they have totally different scaling profiles
  • Missing that egress bandwidth is the dominant cost and driving infrastructure decision

Likely follow-up questions

  • How would you handle a live stream (a single concert watched by 100M people simultaneously)?
  • What changes if you have to support offline downloads for mobile?
  • How would you A/B test a new video player without affecting reliability?
  • How would you handle the 'new release' problem where one title spikes to 50% of all viewing?
  • How would you support multi-region failover when one region's CDN tier loses power?

Practice Design Netflix live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

Is Design Netflix the same as Design YouTube?
Closely related but distinct. Netflix is curated catalog with pre-positioned content; YouTube is open upload with on-demand encoding. Netflix optimizes for predictability and quality; YouTube optimizes for ingest scale and breadth.
Do I need to know specific video codecs for Design Netflix?
Naming H.264 (legacy compatibility), H.265, and AV1 (modern compression) is enough. Drawing the actual encoding pipeline beyond 'transcode → segment → manifest' is bonus signal but never required.
How long is the system-design round at Netflix specifically?
60 minutes is standard at Netflix. They emphasize tradeoff discussion over architecture diagrams and weigh 'why' more than 'what'. Source: Glassdoor Netflix Senior 2023–2024 reports.
Should I cover digital rights management (DRM) in Design Netflix?
Mention it as 'DRM-protected manifest and encrypted chunks; license server in the control plane'. Drilling into specific DRM systems (Widevine, PlayReady, FairPlay) is bonus signal but not expected unless you're interviewing for the playback team.