Design YouTube (Video Sharing Platform)
1. Requirements
Functional
- Upload videos (resumable), transcode to multiple renditions (resolution/bitrate).
- Stream with adaptive bitrate (ABR); thumbnails and previews; captions.
- Search and recommendations feed; channels, subscriptions, comments (comments shallow).
- Studio analytics for creators (aggregate counts).
- Content policy and copyright hooks via fingerprinting pipeline (referenced, not fully designed).
Non-Functional
- Scale: billions of users; upload and view QPS among largest on the internet; storage exabytes of video.
- Latency: playback start p99 low seconds on poor networks—optimize first segment; API p99 ~100–300 ms cached.
- Availability: 99.99% for watch path; upload may retry gracefully.
- Consistency: eventual for view counts; strong for ownership and monetization metadata where money flows.
- Durability: no silent loss of uploaded masters; multi-AZ replication.
Out of Scope
- Full ad auction and Google Ads integration depth.
- Legal content moderation policy specifics.
- Live streaming ultra-low-latency full design (mention extension).
2. Back-of-Envelope Estimations
Assume 2B MAU, 1B hours watched/day industry-scale ballpark, ~5 MB/min effective delivered bytes varies by ABR (orders of magnitude).
-
Views: ~10B views/day → ~120k views/s average; peaks 1M+/s popular events. Metadata hotter than bytes—most bytes served from CDN.
-
Uploads: 500 hours/min urban legend scale—order 10³–10⁴ uploads/s aggregate globally with huge variance.
-
Storage: masters + transcoded ladder EB cumulative; thumbnails and metadata far smaller but non-trivial.
-
Transcoding compute: CPU/GPU fleet sized to ingest spikes; queue backlog acceptable minutes-hours for long tail.
-
Cache: CDN edge dominates; origin shield; popular videos 100% edge hit. Origin egress tbps internal.
Use back-of-envelope sanity checks on CDN hit ratio.
Creator uploads: ~500 hours/min video uploaded (order-of-magnitude industry stat) implies continuous transcode fleet sizing by codec × resolution matrix—not by concurrent viewers alone.
Comment moderation: Comment writes are orders of magnitude fewer than views but spiky during controversy—plan moderation queues separately from view-event ingestion so safety tooling cannot starve playback telemetry (pub/sub isolation).
3. API Design
POST /v1/uploads:init
Body: { title, mimeType, sizeBytes }
-> 200 { uploadId, chunkSizeBytes }
PUT /v1/uploads/{uploadId}
Headers: Content-Range
Body: <binary>
-> 308 Resume
POST /v1/videos/{videoId}:finalize
Body: { visibility, categoryIds }
-> 201 { videoId, processingJobId }
GET /v1/watch/{videoId}/manifest.mpd
-> 200 application/dash+xml
GET /v1/feed/home?cursor=
-> 200 { items: [...], nextCursor }Errors: 400 bad range, 403 region blocked, 429 upload quota.
GET /v1/channels/{channelId}/videos?cursor=
-> 200 { items: [...], nextCursor }
POST /v1/videos/{videoId}/captions
Body: { language, format, uploadUrl }
-> 202 { captionJobId }4. Data Model
- Video:
videoId,channelId,status(processing|live|failed),durationMs,visibility. - Asset:
videoId,rendition(1080p, 720p, ...),blobId,codec,bitrate. - Channel:
channelId,ownerUserId,subscriberCount(materialized/cached). - ViewEvent: append-only for analytics; aggregated counters separate.
Metadata: sharded SQL or Spanner-like for core entities; Cassandra for high-write counters with compaction; object storage for blobs. View aggregates often batch + approximate (consistency tradeoff).
Indexes: Video(channelId, publishedAt DESC); search inverted index on title/tags.
5. High-Level Architecture
Upload lands in raw bucket; queue schedules transcode; outputs to packaged bucket (CMAF/DASH/HLS). Playback is almost entirely CDN. Recommendations separate training from serving with feature store. See CDN and message queues.
6. Component Deep-Dives
- Transcoding pipeline: Normalize codecs; generate ladder; keyframe alignment for ABR switching; parallel segment encode; priority queues for popular creators (policy).
- Manifest generation: DASH/HLS manifests reference segment URLs with CDN tokens; per-region edge URL signing.
- View counting: Ingest high-volume events to Kafka; aggregate to batch stores; sharded counters with debounce; spam detection upstream.
- Search/indexing: Asynchronous indexer on video metadata; Content-ID sidecar for rights.
- Failure: Transcode failure → retry with backoff; poison jobs → dead-letter with alerts; partial ladder degrade gracefully.
7. Bottlenecks & Mitigations
-
Viral video: Origin hot spot—origin shield, request collapsing, multiple CDNs.
-
Upload spikes: Chunked resumable; per-user concurrency caps; regional ingest buckets.
-
Counter inflation: Anomaly detection; cap increment rates per IP/account.
-
Recommendations cold start: Content-based fallback; explore/exploit balance.
-
Live extension: LL-HLS path adds packager sub-second segments and server-side ad cue injection—separate scale story from VOD but shares CDN and auth edge.
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| ABR streaming | Single MP4 file | Mobile network variance |
| Async transcoding | Serve raw upload | Compatibility and bandwidth |
| Approximate view counts | Strongly consistent | Cost at planet scale |
| Third-party + multi-CDN | Single vendor | Resilience and peering |
9. Follow-ups (interviewer drill-downs)
-
100× view spikes? More edge; isolate recommendation reads; shed non-critical APIs.
-
Exactly-once view billing? Idempotent session tokens; reconcile aggregates (idempotency).
-
Migration? Dual-write manifests v1/v2; player feature flags.
-
Multi-region? Encode region-local; replicate packaged assets; metadata global with caches.
-
Cost? Tiered storage for cold content; codec efficiency (AV1) rollout; limit free transcoding.
-
Copyright? Content-ID fingerprint ingest on upload; match policy may block or monetize; dispute flow is legal and ops heavy—system only routes state transitions and retains evidence blobs.
-
Shorts vertical? Vertical encode ladder and separate recommendation index; shares graph for VOD but different engagement features in ranking mixer.
Last updated on
Spotted something unclear or wrong on this page?