Skip to main content

System Design Questions

Design YouTube — System Design Interview Guide

Design YouTube is a system-design interview that asks you to build a video-sharing platform: anyone uploads, anyone watches, and the recommendation feed surfaces relevant content from a 500-hour-per-minute upload firehose. The hard part is upload ingest, multi-bitrate encoding at scale, and recommendations.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Google
  • Meta
  • Amazon
  • Netflix
  • TikTok

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Upload a video of arbitrary length (seconds to hours)
  • Watch a video at adaptive bitrates, with seek and playback controls
  • Search videos by title, description, tags, and channel
  • Get personalized recommendations on the home page and after-video
  • Comment, like/dislike, subscribe to a channel

Non-functional requirements

  • Playback startup: <2 seconds from tap-to-play (p95)
  • Upload throughput: 500 hours of video uploaded per minute globally
  • Watch concurrency: ~200M concurrent viewers at peak
  • Availability: 99.99%; degraded quality acceptable, full outage is not

Capacity estimation

Public scale (2023): ~2.7B monthly users, ~500 hours of video uploaded per minute = ~30K hours/hour = ~700K hours/day of new content. At average bitrate ~2 Mbps for the raw upload, daily upload byte volume is 700K × 3600 × 2 Mb / 8 = ~630 TB/day of original uploads. Each upload is encoded into ~30 variants (resolutions × codecs × audio tracks), inflating storage by ~10× post-encoding to ~6 PB/day of encoded artifacts.

Watch egress: ~1B hours of video watched/day. At blended ~3 Mbps average bitrate, egress is 1B × 3600 × 3 / 8 ≈ 1.3 EB/day. Peak concurrent viewers ~200M; at 3 Mbps average, peak egress is ~600 Tbps — only achievable via a global CDN with edge presence in most major ISPs.

Metadata: ~10B total videos × ~2 KB metadata = ~20 TB; modest compared to media bytes. Comments: 1B videos × avg ~10 comments × ~200 bytes = ~2 TB/year of comments, growing.

High-level design

Three planes: control (metadata, search, recommendations), ingest (upload and encode), data (delivery via CDN).

Upload + ingest: the client requests a resumable upload URL; the upload service writes the original video to object storage as it arrives, in chunks (so a network interruption resumes from the last received chunk, not from byte zero). On upload completion, an event drops into an encoding queue. A fleet of encoding workers consumes the queue and transcodes each video into the standard ladder: ~6 resolutions (144p, 240p, 360p, 480p, 720p, 1080p, plus 4K for sources that warrant it) × ~3 codecs (H.264, H.265, AV1) plus separate audio tracks. Each variant is segmented into ~6-second chunks and uploaded to object storage; manifests (DASH/HLS) describe the available chunks. The video is then marked 'published' in metadata. Encoding is the heaviest compute load in the system: 500 hours/minute uploaded means hundreds of thousands of compute cores doing nothing but transcoding 24/7.

Watch path: client hits the watch service for video metadata and manifest URL. The manifest is fetched from a CDN edge cache; chunks are then fetched as the player needs them, with adaptive bitrate selection per chunk. The CDN is multi-tier: edge caches inside ISPs hold the most popular chunks, regional caches hold the long tail, and origin object storage holds the cold archive. A popular video is positioned to edges within minutes of becoming popular; a niche video is served from regional cache or origin.

Control plane: search uses an inverted index over title, description, and channel; ranking blends text relevance with engagement signals. Recommendations run as a two-stage pipeline (candidate generation + ranking — see deep dive). Comments and likes live in sharded relational stores keyed by video_id (so all comments for a video live on one shard).

Deep dive — the hard problem

Three deep dives: ingest pipeline elasticity, CDN tier organization, and the recommendation pipeline.

Ingest elasticity: upload rate spikes 10× during regional prime times and after major events. The encoding queue is the elastic component — it absorbs bursts by accumulating depth, while the worker fleet auto-scales on queue length. Tradeoff: longer queue = longer time-to-published; shorter queue = wasteful idle workers. The actual production answer: prioritize encoding by predicted view count. A creator with a billion subscribers gets their upload encoded ahead of an anonymous user, because the head-of-line latency directly impacts visible product quality. Mention this priority-aware queue as the way you'd handle the head-of-line problem.

CDN tier organization: the Pareto problem is severe — top 0.01% of videos serve ~30% of view hours. Caching strategy: edge caches near viewers hold this 'hot 0.01%' explicitly; regional caches hold the next 1%; cold object-storage origin serves the rest. Eviction is LFU with a recency boost so a newly-trending video can race up the tiers in hours. Predicting hotness for prewarming is partly engagement-signal-driven (likes/min, comment velocity) and partly creator-driven (channel subscriber count, viral score). Custom CDN appliances inside ISPs (same model as Design Netflix) keep last-mile bandwidth manageable; for non-ISP regions, third-party CDNs absorb overflow.

Recommendations: too rich to design end-to-end in 45 minutes, but the two-stage pattern is the answer expected. Stage 1 (candidate generation): from a corpus of billions of videos, narrow to ~500 candidates relevant to a user. Done with cheap heuristics — subscribed channels, similar-watched videos, recent uploads, trending in user's region — plus a learned embedding retrieval (user vector × video vector cosine similarity, computed offline, served from a vector index). Stage 2 (ranking): score the 500 candidates with a heavyweight model that takes (user_features, video_features, context) → predicted watch-time-weighted-with-feedback score. Sort, return top 20. Discuss the offline-train / online-serve split, mention the heavyweight-model serving latency budget (~50ms), and acknowledge the cold-start problem (new users, new videos).

Finally, idempotent upload. A 4 GB upload over a flaky mobile connection might retry the same chunk twice; the upload service must dedupe via a chunk-hash + upload-session-id. Same chunk hash + same session = treat as already received. This is small but interviewers love it as a robustness signal.

Common mistakes

  • Designing single-resolution playback — modern viewers demand adaptive bitrate, missing it is a hire-reduce signal
  • Encoding inline during upload (synchronous) instead of async on a queue — kills upload throughput and elasticity
  • Treating the recommendation system as one monolithic model — the two-stage candidate-then-rank split is the expected answer
  • Forgetting CDN tier structure — at ExaByte/day egress, a single-tier CDN melts
  • Conflating control plane and data plane infrastructure — they scale 1000× apart and shouldn't share design

Likely follow-up questions

  • How would your design support a live-stream concert watched by 50M concurrent users?
  • What changes if you have to add real-time chat during video playback?
  • How would you implement watch-time analytics that scale to 1B hours/day?
  • How would you support an offline-download feature for mobile users on metered data?
  • How would you detect and remove copyrighted content uploaded by users?

Practice Design YouTube live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

Is Design YouTube the same as Design Netflix?
Related but distinct. Netflix is a curated catalog with pre-positioned content (~20K titles); YouTube is open-upload with ~700K hours of new content per day. Netflix optimizes for predictability; YouTube optimizes for ingest scale and discovery.
How long is the Design YouTube interview at Google?
60 minutes is standard; senior rounds (L6+) push to 75 minutes and expect detailed coverage of recommendations and CDN tiers. Source: Glassdoor Google L5/L6 reports 2023–2024.
Do I need to design the recommendation algorithm in detail?
The two-stage candidate + ranking split is the expected answer. Drilling into specific ranking model architectures is bonus signal but not required unless interviewing for the recommendation team.
What's the single most important deep dive in Design YouTube?
The encoding pipeline. Upload-to-published latency is a make-or-break product property, and the priority-aware queue + auto-scaling worker fleet is the cleanest 'I understand elasticity' signal.