Design Dropbox (Cloud File Sync)
1. Requirements
Functional
- Users install a client that syncs a designated folder across devices; changes propagate reliably.
- Support large files, resume uploads, block-level deduplication to save bandwidth and storage.
- Version history and restore for files (retention policy driven).
- Sharing links with optional passwords/expiry; folder sharing with permissions (simplified).
- Web and mobile access with preview for common formats.
Non-Functional
- Scale: hundreds of millions of users; exabytes of logical data; metadata billions of files per shard tier.
- Latency: metadata sync notification sub-second typical; chunk upload throughput limited by client network; list folder p99 low hundreds of ms.
- Availability: 99.99% for metadata; blob storage durability 11 nines typical on object store.
- Consistency: users expect read-your-writes on own namespace; eventual across devices within seconds is acceptable for many sync products—call out tradeoff vs strong global consistency (consistency).
- Durability: no silent data loss; checksums end-to-end.
Out of Scope
- Full collaborative real-time editing (Google Docs style).
- Antivirus marketplace and legal e-discovery beyond hooks.
- On-prem enterprise appliance full spec.
2. Back-of-Envelope Estimations
Assume 500M registered users, 100M MAU, avg 50 GB stored logical per paying skew (heavy tail).
-
Active sync operations: 10M users syncing daily with ~100 file ops/day → 1B ops/day metadata churn → ~12k QPS average; bursts 10× on OS bulk changes.
-
Chunk storage: average 4 MB chunks with dedup ratio global 30–50% for popular binaries → raw stored volume grows slower than logical. New uploads ~100 PB/year order-of-magnitude at scale (depends on product mix).
-
Bandwidth: client-side dedup reduces WAN; cross-region replication adds internal bandwidth—budget tbps aggregate internal.
-
Metadata DB: ~1 KB per file row + indexes → billions of files → single-digit TB metadata per large shard before archives.
-
Cache: hot file lists per user—Redis cluster ~100 GB–1 TB aggregate working set for active sessions (80/20 of recently touched namespaces).
Desktop vs mobile: Mobile clients may prefetch only metadata for recent folders—assume 10× fewer chunk uploads per DAU than desktop power users; adjust bandwidth models when sizing ingress.
Photo libraries: Image-heavy folders dedupe poorly across users (entropy high) but dedupe well within one user after edits—expect higher dedupe ratios for professional accounts that reuse templates versus consumer vacation albums.
3. API Design
POST /v1/chunks
Headers: Content-SHA256, Content-Length
Body: <binary>
-> 201 { chunkId, deduped: true|false }
POST /v1/files/commit
Body: { path, revision, chunkIds: [{ offset, chunkId }], clientRevision }
-> 200 { fileId, revision, conflict: false }
GET /v1/updates?cursor=
-> 200 { events: [...], nextCursor }
GET /v1/files/{fileId}/download
-> 302 Location: presigned-blob-urlErrors: 409 revision conflict, 412 precondition failed, 429 throttled (rate limiter).
GET /v1/namespaces/{namespaceId}/changes?sinceRevision=
-> 200 { files: [...], deleted: [...], latestRevision }
POST /v1/shares
Body: { itemId, targetEmail, role }
-> 201 { shareId }4. Data Model
- User:
userId, quota, plan. - Namespace (per user or team):
rootRevision, ACL snapshot id. - FileNode:
fileId,parentId,path(materialized for perf or reconstructed),revision,contentHash,size. - Chunk:
chunkId(hash),storageKey,refCount,size.
Metadata: Postgres or MySQL sharded by userId / teamId. Blob in S3-compatible object storage with content-addressed keys for chunks. See sharding.
Indexes: (userId, parentId, name) unique; chunkId primary for chunk table.
Sample: chunk row (sha256-abc, s3://.../ab/ababc..., refCount 91234).
5. High-Level Architecture
Chunk gateway streams to object storage; metadata service commits manifests transactionally. Outbox via Kafka drives fan-out to other devices. Optional LAN P2P for same-office sync (product-specific). Message queues decouple commit from notifications.
6. Component Deep-Dives
- Sync algorithm: Per-file revision monotonic; client maintains journal; server rejects stale commits. Conflict policies: last-writer-wins or rename side branch—product choice must be explicit.
- Chunking: Fixed or content-defined chunking (Rabin) for dedup; rolling hash boundaries survive inserts.
- Deduplication: Hash index on
chunkId; incrementrefCount; garbage-collect unreferenced chunks asynchronously with legal hold checks. - Upload: Multipart large files; presigned URLs for direct-to-S3 from client when possible.
- Failure: Partial upload → resume via chunk manifest; DB deadlock → retry with backoff; idempotent chunk PUT by hash.
7. Bottlenecks & Mitigations
-
Hot chunks: Popular installers—single chunk hot spot in ref counts; storage OK; cache metadata.
-
Thundering herd on folder listing after reconnect—delta sync via
cursor, not full tree. -
Metadata shard hot users—move shards or subshard; rate-limit destructive ops.
-
GC pressure on chunk deletion—async jobs with rate caps.
-
Antivirus / DLP hooks: Optional async scan on commit flags files quarantined—clients poll
pending_scanstate; throughput limiter prevents scan backlog from blocking unrelated namespaces (pub/sub fan-out to scanner workers).
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Content-addressed chunks | File-level snapshots only | Storage and bandwidth savings |
| Eventual cross-device | Strong global ordering | Latency and availability over strict linearizability |
| SQL metadata | Pure KV | Complex directory constraints and transactions |
| Kafka notifications | Polling only | Push latency and server efficiency |
9. Follow-ups (interviewer drill-downs)
-
100× metadata QPS? More shards; read replicas with read-your-writes routing; cache folder listings.
-
Exactly-once chunk upload? Use idempotency-Key + chunk hash uniqueness (idempotency).
-
Migration? Dual-write metadata fields; compare counts; per-shard cutover.
-
Multi-region? Primary metadata region per user; async blob replica; conflict rules for travel.
-
Cost? Aggressive dedup; cold storage tier; throttle preview transcoding.
-
Selective sync? Per-folder subscription state on clients must reconcile with server namespace—version per folder or ETag per file reduces conflict explosions on tree moves.
-
Blob conflict UX? When two devices diverge without OT, branch copies or last-writer-wins—retain both blobs for support until GC policy allows pruning.
-
Enterprise legal hold? Some tenants freeze deletes—GC must respect legal_hold flags on namespaces even when chunk refcount hits zero, shifting storage cost to compliance budgets (consistency between policy DB and blob metadata critical).
-
LAN sync? Peer discovery adds mDNS chatter—provide kill switch per deployment when corporate networks ban multicast; fallback to cloud only path automatically after N failures.
-
Paper trail? Support tooling needs immutable audit on admin restores—who touched which revision when, linked to ticket IDs for later disputes (consistency between support DB and user-visible history).
Last updated on
Spotted something unclear or wrong on this page?