Design Google Drive — System Design Interview Guide
Design Google Drive is a system-design interview that asks you to build a cloud-storage product with collaboration: users upload files, sync across devices, share with permissions, and collaboratively edit documents in real time. The hard part is the file/folder tree, chunked uploads, and the bridge to live-collaborative documents.
By Alex Chen, Founder, InterviewChamp.AI · Last verified
Reported in interviews at
- Microsoft
- Dropbox
- Box
- Apple
Sourced from Glassdoor, Levels.fyi, and Blind interview reports.
Functional requirements
- Upload a file (text, image, video, archive) from a client device to the cloud
- Organize files in nested folders; move and rename files and folders
- Share a file or folder with another user or via public link, with read/comment/edit permissions
- Sync files bidirectionally across devices; resolve conflicts when offline edits collide
- Collaborative live editing for Docs/Sheets/Slides-style document types (multiple users editing simultaneously)
Non-functional requirements
- Upload speed: support multi-gigabyte single files with resumable uploads
- Sync latency: <5s p99 from save on device A to visible on device B (warm connection)
- Durability: 11 nines — files never lost once acknowledged
- Scale: ~3B+ users (Google ecosystem), exabytes of stored data, peak ~1M+ file-change events/sec
Capacity estimation
Google Drive public scale (2023-2024 reports): part of Google Workspace which serves ~3B users across consumer + paid. Drive holds exabytes of files; precise totals aren't public but the platform processes ~1B+ file operations/day. Average user holds dozens to low thousands of files; enterprise users have 10K+. File size distribution is long-tail — most files are <1 MB but a small fraction reach hundreds of GB (large videos, datasets).
Storage: exabyte-scale total, growing by ~1+ PB/day net. Byte-storage dominates costs; metadata (file/folder rows, permissions, version history) is small in bytes (~1 KB per file row × billions of files = ~10s of TB) but enormous in row count.
Write QPS: file-change events at ~1M/sec peak across the platform. Most events are small (rename, move, single-block edit); only ~10% are full uploads. The hot path: a metadata write per file change + an object-storage write for new content. Read QPS dwarfs writes: ~10M+ file/folder list operations per second at peak (every Drive UI page load is a metadata fetch).
High-level design
Four core domains: file metadata, file bytes, collaboration, and permissions. Each scales independently.
File metadata: a sharded relational store keyed on user_id (or organization_id for enterprise) holds the file/folder tree. Each row has (file_id, parent_folder_id, owner_id, name, type, size, version, mime_type, modified_at). The folder structure forms a tree; listing 'all files in folder X' is a single-shard range query on parent_folder_id. Most folders have <1000 items, so pagination handles the tail.
File bytes live in object storage as chunks. The client computes content-addressed chunks (~4 MB each) before upload, sends only chunks not already in the cloud (deduplication), and the metadata row references the chunk list. Cross-user dedup falls out of content addressing — a popular installer uploaded by 1M users is stored once. Resumable uploads: the client tracks which chunks have been acknowledged and resumes from the last unacknowledged chunk on reconnect.
Collaboration is its own service for live-editable document types (Docs, Sheets, Slides). When a user opens a Doc, the client connects to a collaboration server that holds the document's current state in memory and applies incoming edit operations (insert character, delete, format change) using operational transformation. Edits are broadcast to all clients on the doc. Periodic snapshots flush back to the metadata store; the operation log lives separately and supports replay for revision history.
Permissions: a separate service tracks per-file ACLs (user_id → role) and per-folder inheritance. Resolving 'can user X read file Y' is a tree-walk up the folder structure until an explicit ACL is hit. Cached for a few seconds per (user, file) with invalidation events on permission changes. Sharing links (e.g. 'anyone with the link can view') are special tokens that bypass ACL lookup but are revocable.
Deep dive — the hard problem
Three deep dives: chunked uploads with dedup, the bridge to collaborative documents, and the folder-tree at scale.
Chunked uploads with dedup: a naïve full-file upload of a 5 GB video over a flaky mobile connection has a high probability of failing. Content-defined chunking (Rabin fingerprinting or FastCDC) splits files into ~4 MB chunks at content-defined boundaries — a one-byte insertion at the start of a large file shifts only the first chunk, not all subsequent ones. The client computes hashes of all chunks, sends the hash list to the metadata service, receives back the list of chunks the cloud doesn't have, and uploads only those. Each chunk upload is independent and resumable. The metadata write commits the full chunk list only after all chunks are durable in object storage.
The deduplication payoff is real: across billions of users, popular files (installers, common datasets, viral memes) are stored once and referenced billions of times. Public dedup rates for cloud-storage products are typically 30-60% of raw uploaded bytes.
Collaborative documents: this is the unique surface that distinguishes Drive from a pure file-sync product. When a file is a 'live' document type, edits don't follow the chunk-upload path — they flow through the collaboration service as fine-grained operations. The document is stored as an operation log in the metadata store, with snapshot rollups every N operations. A read of the document loads the most recent snapshot and replays subsequent operations. Operational transformation handles concurrent edits from multiple users by transforming each operation against operations that have happened since the sender's last sync, so the final state converges regardless of arrival order. Mention OT (or CRDT as the modern alternative) explicitly; this is the surface that separates Drive-style collaboration from pure Dropbox-style sync.
Folder tree at scale: an enterprise org with 100K users and 100M files needs efficient tree operations. Naïve recursive folder reads (walking from root to a file) are slow at depth. The standard pattern: store a materialized path on every file row (e.g. '/Marketing/2025/Campaigns/Q3') so tree-walks are a single read. On a folder move, the materialized path of every descendant needs to update — bulk update by prefix matching the moved folder's old path. For files in a moved folder, the metadata service performs an asynchronous bulk update with a soft-link forwarding old paths to new during the migration window.
Fourth tradeoff: enterprise vs consumer. Consumer accounts are isolated by user_id; enterprise accounts must support cross-user access within an organization with org-level admin controls (org-wide search, retention policies, eDiscovery exports). Mention enterprise as a tier above consumer with extra services on top.
Common mistakes
- Uploading whole files on every save instead of chunking with dedup
- Storing file bytes in the metadata store — kills both stores and limits file size
- Skipping the collaboration-engine surface — Drive isn't Dropbox, the live-edit story is the differentiator
- Forgetting resumable uploads — a 5 GB file over mobile must survive 10 connection drops
- Treating folder moves as cheap — at deep folder hierarchies they're expensive bulk updates
Likely follow-up questions
- How would you support a 100 GB single-file upload over a flaky mobile connection?
- What changes if you add end-to-end encryption (server can't read file content but search still works)?
- How would you implement search across all files a user has access to, including content inside PDFs and Docs?
- How would you handle a shared folder with 10K active collaborators?
- How would you support offline access where a user marks a folder for offline use and edits while disconnected?
Practice Design Google Drive live with an AI interviewer
Free, no sign-up required. Get real-time feedback on your design.
Practice these liveFrequently asked questions
- How long is the Design Google Drive system-design round?
- 60 minutes is typical at Google L5/E5+. The expectation is full coverage of metadata + bytes + collaboration + permissions. Source: Glassdoor Google 2022-2024 reports.
- Is Design Google Drive harder than Design Dropbox?
- Yes, because Drive includes live collaboration. Dropbox is pure file sync; Drive adds the OT/CRDT collaboration engine for Docs/Sheets/Slides. Most interviewers consider Drive ~20% harder for that reason.
- Do I need to know OT in detail?
- Naming the concept and explaining 'transform incoming operations against operations that happened since the sender's last sync' is enough. Drawing the actual transformation function (insert vs delete vs concurrent ranges) is overkill in 45 minutes.
- Should I cover Docs/Sheets/Slides separately?
- Cover the collaboration engine once at a generic level — they all share the same OT-based architecture. Drilling into Sheets-specific formula evaluation is bonus signal but only if you've covered everything else first.