Design Google Drive (Cloud Storage & Collaboration Shell)

1. Requirements

Functional

Users upload, download, organize files and folders; search across owned and shared content.
Share files/folders with users or groups; viewer/commenter/editor roles (simplified ACL model).
Integrate with office-style editors via conversion and export hooks (editors not fully specified here).
Virus scan / policy hooks on upload (async).
Activity feed and notifications on shares and edits (high level).

Non-Functional

Scale: billions of users; exabytes stored; metadata extremely large cardinality.
Latency: list directory and search p99 under ~300 ms warm; small file download starts quickly via redirect to CDN/signed URL.
Availability: 99.99% metadata path; object store durability contract from provider.
Consistency: ACL changes must converge safely; users expect read-your-writes for own actions; shared edits may be eventual depending on editor integration (consistency).
Durability: same as enterprise cloud expectations; audit logs for sharing events.

Out of Scope

Full collaborative algorithm for Docs/Sheets CRDT (reference only).
Billing and enterprise license SKUs.
Client-side encryption full product (mention tradeoff only).

2. Back-of-Envelope Estimations

Assume 2B accounts, 400M DAU touch Drive indirectly, 50M heavy Drive users with 1000 files each average (heavy tail skew).

Metadata rows: 50B+ file entries global across shards (many inactive); active working set low percent.
Operations: list, search, create dominate; ~5k–50k QPS metadata per large region after caching (highly dependent on product bundling).
Storage growth: consumer + workspace hundreds of PB/year order-of-magnitude; dedup less than backup products but thumbnail and preview derivatives add cost.
Search index size: tens of PB aggregate inverted index + vectors if semantic search enabled.
Bandwidth: downloads mostly from object store + CDN; internal replication cross-region.
Cache: per-user recent file lists and folder children in Redis; ~80/20 recency for home screen.

Workspace vs consumer: Enterprise tenants multiply permission edges—budget 10–100× more ACL rows per logical file for large domains; group expansion must stay outside the critical read path via materialized edges.

Audit exports: Large enterprises may request full ACL graphs quarterly—size batch walkers to stream millions of rows without loading entire graphs into API worker RAM (cursor-based BFS per shared drive root).

3. API Design

POST /v1/files:upload?uploadType=multipart
-> 200 { fileId, sessionUrl }

POST /v1/files/{fileId}:complete
Body: { md5Hash, sizeBytes, parentId, name }
-> 201 { fileId, revisionId }

GET /v1/files/{fileId}/permissions
-> 200 { acl: [{ subjectId, role }] }

POST /v1/files/{fileId}/permissions
Body: { subjectEmail, role }
-> 200 { permissionId }

GET /v1/files?q=mimeType%3Dpdf
-> 200 { files: [...], nextPageToken }

Errors: 403 ACL denied, 404 trashed, 409 name collision, 429 quota/rate.

POST /v1/files/{fileId}:copy
Body: { parentId, name }
-> 201 { fileId }

GET /v1/activities?itemId=
-> 200 { events: [{ actor, action, timestamp }] }

4. Data Model

User: userId, domainId (workspace), quotas.
DriveItem: itemId, parentId, ownerId, name, mimeType, revisionId, trashed.
Permission: itemId, subjectId (user/group), role enum.
Revision: revisionId, itemId, blobId, timestamp.

Spanner / DynamoDB / Cassandra-style wide-column or globally distributed SQL for horizontal scale, or sharded Postgres per cell. Blob in object storage. Search in inverted index (Elasticsearch) with per-user ACL filters (bloom filter not sufficient alone—use query-time ACL expansion strategies).

Indexes: (parentId, name) unique per folder; search secondary index on ownerId, mimeType, modifiedTime.

5. High-Level Architecture

Loading diagram…

Upload orchestrator handles resumable sessions. Search is eventually consistent with metadata. Kafka fans out share events. CDN serves thumbnails and public web previews when allowed. See API gateway for request shaping and auth termination patterns.

6. Component Deep-Dives

ACL evaluation: Resolve inheritance down folder tree—cache effective permission per (userId, itemId) with invalidation on share changes; avoid deep tree walks per request.
Trash and restore: Tombstone with retention window; background purge blob when last referencing revision GC’d.
Search: Ingest metadata changes; enforce ACL at query using join on allowed item IDs or document-level security lists—trade memory vs latency.
Conflicts: Single-writer per file for binary; collaborative editors use separate real-time service.
Failure: Split-brain mitigated by single-primary per item shard or consensus for critical updates—align with replication fundamentals.

7. Bottlenecks & Mitigations

Popular shared folders: ACL fan-out large—expand groups offline; cache membership.
Search hot queries: rate limit automation abuse; approximate counters for quota display.
Upload storms: token bucket per user; regional upload endpoints.
Cross-region: read local metadata replica with bounded staleness label for UI.
Large-folder listing: Folders with 100k+ children require pagination tokens and server-side sort keys—avoid OFFSET deep pages; seek from (name, itemId) cursor.

8. Tradeoffs

Decision	Alternative	Why we picked
Central metadata service	Per-device only	Sharing and search require server truth
Async search index	Sync index	Write latency and coupling
Signed URL downloads	Proxy all bytes	Cost and throughput
Role-based ACL	Object ACL only	Folder inheritance requirements

9. Follow-ups (interviewer drill-downs)

100× folder listing? Partition by parentId; materialized path or hierarchical IDs; edge cache.
Exactly-once side effects on share? Outbox + idempotent consumers (idempotency).
Migration? Strangler for metadata fields; dual-read search.
Active-active multi-region? Conflict-free for renames hard—use primary region per item or last-write-wins with audit.
Cost? Lifecycle to Nearline/Coldline; dedup thumbnails; query shaping.
Commenting on files? Threaded metadata in separate store with itemId FK; real-time optional via WebSocket channel per file to avoid poll storms on hot docs.
Export compliance? Takeout batch jobs zip per-user graph—walk tree with BFS and stream to object store; rate limit to protect metadata DB.
Shared drives at scale? Membership changes fan out millions of effective ACL edges—precompute delta ACL snapshots nightly for search filtering instead of expanding groups per query (replication-friendly read models).
Malware scanning latency? Large archives block share links until scan completes—surface progress percent in UI so users do not repeatedly re-upload, which would amplify ingress cost.

On this page