Key Architectures (Popular Systems)
Analyzing real-world systems provides proven architectural blueprints and strategies for dealing with extreme scale, hot spots, and geographic distribution.
🐦 1. Twitter / X: Timeline Generation
Twitter's core challenge is handling the massive write load of incoming tweets combined with the read load of generating custom timelines for active users.
The Challenge: Fan-Out Bottlenecks
- Write Volume: ~5,000 tweets per second average (peaks at 12,000+).
- Read Volume: ~300,000 timeline lookups per second.
- Scale Inequity: A normal user has 100 followers; a celebrity has 100 million. A naive "write to all followers' feeds" (push) crashes on celebrity tweets. A naive "query all users I follow on read" (pull) is too slow for 300k QPS.
The Solution: Hybrid Fan-Out Architecture
Incoming Tweet ──► User Follower Check
│
├─► Normal User Follower ──► PUSH (Write to Redis Memory Feeds)
│
└─► Celebrity (>10k followers) ──► PUSH SKIPPED (Write to Celebrity Tweet DB)
▲
│ (Blended on read)
User Reads Home Timeline ──► Fetch cached feed from Redis ───────────┘- Push Model (Write Fan-out) - For Normal Users:
- When a normal user tweets, the system queries their follower list.
- The tweet ID is injected directly into the active memory cache (Redis) of each follower's home timeline.
- Result: Home timeline lookup is an $O(1)$ fast Redis read.
- Pull Model (Read Fan-out) - For Celebrities:
- When a celebrity (e.g., millions of followers) tweets, the write fan-out is skipped.
- The celebrity's tweet is written only to a relational/document store.
- Timeline Blending (Hybrid Read):
- When a user requests their home timeline, the system fetches their cached push timeline from Redis.
- Simultaneously, it fetches tweets from any celebrity the user follows and merges them chronologically into the final list.
- Benefit: Avoids write-amplification while maintaining low read-latencies.
🎬 2. Netflix: Video Streaming
Netflix operates a split architecture: the control plane (running on AWS) handles everything before you press "Play", and the data plane (Open Connect) delivers the video streams.
[ AWS Control Plane ] ──► Sign up, recommendations, metadata
│
▼
[ Open Connect CDN Appliance ] ──► Placed inside local ISPs ──► Direct stream to Client1. Control Plane (Microservices on AWS)
- Handles user registration, billing, searching, machine-learning-driven recommendations, and movie metadata.
- Utilizes Cassandra for high-availability user metadata storage, and DynamoDB for transactional metadata.
2. Data Plane (Open Connect CDN)
- Netflix bypasses standard CDNs (like Akamai or Cloudflare) for video streaming by using their custom CDN called Open Connect.
- Hardware Appliances: Netflix builds custom storage hardware servers (Open Connect Appliances) and places them directly inside local Internet Service Provider (ISP) facilities for free.
- Off-Peak Sync: During off-peak hours (middle of the night), these local appliances download popular movies directly from Netflix's AWS storage.
- Zero-Latency Delivery: When a user streams a movie, it is served directly from the local ISP datacenter node, eliminating long-distance backbone network bandwidth costs and buffering latencies.
3. Video Transcoding Pipeline
- A single movie can take several Terabytes of raw storage.
- Netflix breaks down raw files into small chunks, parallelizes processing across AWS nodes, and encodes each chunk into thousands of output formats (resolutions, devices, audio tracks, and codecs).
- Dynamic optimization feeds different bitrates based on the client's current network connection.
🚗 3. Uber: Real-time Dispatch
Uber's core challenge is matching riders to nearby drivers in real-time, requiring rapid processing of location coordinates.
1. Geospatial Indexing (H3 & S2)
- Traditional databases are not optimized for fast polygon searches (e.g., "find all drivers within 2 miles of latitude X, longitude Y").
- Uber uses H3 (a hexagonal spatial index system). H3 divides the entire Earth's surface into a grid of nested hexagons.
- Hexagonal Advantage: The distance between a hexagon center and all its six neighbors is identical, making radial search calculations extremely fast.
/\ /\
| |___| |
| | | |
/ \/ \/ \
| | ● | | <-- Rider at center hexagon (Search neighbors)
| |_____| |
\ / \ /
| |___| |
| | | |
\/ \/2. Dispatch Architecture (DISCO)
- DISCO (Dispatch Core): Written in Go, it manages the locations of all active drivers and handles rider booking requests.
- State Tracker: Drivers report their GPS locations every 4 seconds. The location update goes through an API Gateway to a memory-cached State Store (e.g., Ringpop/Redis).
- Geohash Lookup: When a rider requests a pickup, DISCO:
- Resolves the rider's GPS coordinate to a specific H3 hexagon.
- Fetches all active drivers located in that hexagon and the adjacent six hexagons.
- Filters out drivers who are offline or busy.
- Runs a matching algorithm to dispatch the ride.
3. Dynamic Surge Pricing
- Surge pricing calculations are done using real-time geospatial demand tracking.
- If booking requests (demand) within an H3 hexagon exceed active drivers (supply), a surge pricing factor is calculated and updated in-memory for all requests originating from that specific hexagon.
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?