Skip to main content

System Design Questions

Design Dropbox — System Design Interview Guide

Design Dropbox is a system-design interview that asks you to build a cloud-storage product: users sync files across devices, share folders with others, and resolve conflicts when two devices edit the same file. The hard part is chunking, deduplication, and bidirectional sync.

By Alex Chen, Founder, InterviewChamp.AI · Last verified

Reported in interviews at

  • Dropbox
  • Google
  • Microsoft
  • Box
  • Apple

Sourced from Glassdoor, Levels.fyi, and Blind interview reports.

Functional requirements

  • Upload a file from a client device to the cloud
  • Sync files bidirectionally across multiple devices for the same user
  • Share a file or folder with another user or via public link
  • Resolve conflicts when the same file is edited on two devices offline
  • Restore previous versions of a file (version history)

Non-functional requirements

  • Sync latency: <5 seconds from save on device A to visible on device B (warm connection)
  • Durability: 11 nines — no file ever lost once acknowledged as uploaded
  • Scale: ~700M users, ~600 PB stored, peak ~50K file-change events/sec
  • Bandwidth efficiency: a 1-byte edit to a 100 MB file should not re-upload 100 MB

Capacity estimation

Public scale (Dropbox 2021–2023): ~700M registered users, ~17M paid, ~600 PB of stored data. Daily file-change event volume: ~5B events/day (uploads, edits, deletes, shares) ≈ 60K events/sec average, peak ~150K. Most changes are small (rename, single-line edit); only ~20% are full uploads of new files.

Storage growth: ~1 PB/day net new (after dedup); over a year that's ~365 PB before dedup, ~150-200 PB after. Average file ~1 MB; long tail of large files (videos, archives) pushes the byte average up. With cross-user dedup (popular shared files like installers, common docs), realized storage is roughly 50-60% of raw uploaded bytes.

Metadata is the unsung scaling problem: 700M users × ~10K files/user average = 7 trillion file-version rows. The metadata store dwarfs many social products in row count even though byte volume lives in object storage.

High-level design

Separate the bytes from the metadata. Bytes (file contents) live in object storage as chunks; metadata (file tree, owner, sharing, version history) lives in a sharded relational store.

Clients run a sync agent that watches the local filesystem. On a local change, the agent computes content-addressed chunks: split the file into ~4 MB blocks, compute a hash of each block (the chunk_id). The agent compares the new chunk_id list against the cloud's known chunk_id list for that file (from metadata) — only chunks whose hashes differ are uploaded. Each chunk is uploaded once globally and referenced by hash (cross-user dedup falls out for free: two users uploading the same installer share the same underlying chunk row in object storage).

Upload flow: client lists the chunk hashes it has; metadata service responds with which hashes are unknown; client uploads only the unknown chunks (directly to object storage with a signed URL); client posts a new file-version row referencing the full chunk list. The metadata write triggers a sync-event notification to every other device of the same user (and to any users sharing the parent folder) over a persistent connection, prompting them to fetch the new version.

Download flow on a peer device: receive sync-event, fetch new file-version metadata, fetch only the chunks not already in local cache, reassemble the file, atomically rename into place. Sharing is a separate service over the same metadata — a shared folder is a permission edge from a folder to additional user_ids; sync events fan out across that edge.

Deep dive — the hard problem

Three deep-dive surfaces: chunking and dedup, sync conflict resolution, and the notification fanout.

Chunking: fixed-size 4 MB chunks are simple but suffer the 'boundary shift' problem — inserting one byte at the front of a file shifts every subsequent chunk by one byte and breaks dedup. Production solutions use content-defined chunking (Rabin fingerprinting / FastCDC): walk the file with a rolling hash, place chunk boundaries at positions where the hash matches a target pattern; the resulting chunks are robust to insertions and deletions, and dedup recovery is dramatic. Mention this tradeoff explicitly: fixed-size for simplicity, content-defined for byte efficiency.

Conflict resolution: two devices offline both edit /report.docx; both reconnect. The metadata service detects two new versions diverging from a common parent. Three options: last-writer-wins (simple, lossy — Dropbox-style), keep both as 'foo (conflicted copy from device X) (date).docx' (Dropbox's actual approach for binary files), or operational-transform / CRDT merge (only viable for structured text — what Google Docs uses). For a generic file-sync product, the conflicted-copy approach is the safe default and is what Dropbox ships.

Notification fanout: each user has ~5-10 devices typically. Sync events fan out to all of their devices over a persistent connection (same pattern as Design WhatsApp). For shared folders, fanout extends to all users with access. A heavily-shared team folder (1000 collaborators) generates 1000 sync events per change — bound the fanout by skipping devices that have been offline >24h and replaying state on their next reconnect.

Finally, durability. Files are chunked, each chunk written to object storage with erasure coding across multiple zones. Acknowledge only after a quorum of zones has confirmed; this is what gets you 11 nines of durability. Metadata rows are written with multi-zone synchronous replication.

Common mistakes

  • Uploading whole files on every change — wastes 99% of bandwidth at scale
  • Using fixed-size chunks without acknowledging the boundary-shift problem
  • Skipping conflict resolution — interviewer will ask 'what if I edit the same file on two laptops offline'
  • Conflating metadata storage with byte storage; they have totally different scaling profiles
  • Designing notifications as polling instead of push — kills latency and battery on mobile

Likely follow-up questions

  • How would you support multi-device editing of the same Google-Docs-style live document?
  • What changes if you add end-to-end encryption (server can't read file content)?
  • How would you handle a 50 GB single-file upload from a flaky connection?
  • How would you implement file version history that scales to 1000 versions per file?
  • How would you support a 'sync this folder to disk only on demand' (selective sync) feature?

Practice Design Dropbox live with an AI interviewer

Free, no sign-up required. Get real-time feedback on your design.

Practice these live

Frequently asked questions

Is Design Dropbox harder than Design Twitter?
Different shape. Twitter is fanout-on-write; Dropbox is chunking + sync + conflict-resolution. Most interviewers consider them comparable difficulty; the failure modes are very different (Twitter: missed celebrity case; Dropbox: missed dedup or conflict).
Do I need to know about content-defined chunking specifically?
Naming the concept and explaining why it beats fixed-size for dedup is enough. Drawing the Rabin fingerprinting algorithm is bonus signal but not required at any level except for storage-team-specific interviews.
How long is the Design Dropbox interview at Dropbox itself?
60 minutes typical. Dropbox's loop explicitly covers chunking, sync, and at least one of (conflict resolution / sharing / version history). Source: Glassdoor Dropbox 2022–2024 reports.
Should I mention specific cloud storage providers?
No. Discuss 'multi-zone object storage with erasure coding' as the property. Naming specific providers leaks vendor opinions interviewers don't need.