Design Twitter (Microblogging)
1. Requirements
Functional
- Users register profile and post short messages (“tweets”) with optional media attachments stored externally.
- Follow other users; build personalized home timeline mixing tweets from followed accounts with ranking signals (interview may scope chronological MVP).
- Like, retweet, reply forming threads; replies expand conversation trees.
- Search tweets (often delegated to separate Lucene/ES cluster scope).
- Notifications for mentions and followers (tie to notification system design).
Non-Functional
- Scale: 500M+ users class problem; thousands of tweets per second global; hundreds of thousands timeline reads per second aggregate.
- Latency: post visible to self immediately; fan-out to followers under few seconds for normal users; strict read-your-writes for author.
- Availability: 99.95% for reads; tolerable brief inconsistency for rare edge cases in distributed systems.
- Consistency: hybrid timeline model — eventual fan-out with synchronization hooks for critical UX paths.
Out of Scope
- Full ads marketplace and billing.
- Spaces/audio rooms engineering depth.
- Full-text indexing internals beyond inverted index mention.
- Federation / ActivityPub.
2. Back-of-Envelope Estimations
Assume 250M DAU; 150M tweets/day average historical ballpark for teaching.
-
Write: 150M / 86,400 ~ 1,736/s; peak sports/events 20k/s.
-
Read: If each user refreshes home timeline 30x/day → 7.5B timeline requests/day ~ 86,500/s worldwide; many cached — origin tier sees fraction.
-
Storage: tweet metadata ~200 B + text up to 280 chars UTF-8 ~ 1 KB avg → 150 GB/day raw; multi-year multi-PB with media pointers excluded.
-
Fan-out writes (push model): median followers low (~ hundreds maximum mass); celebrity outliers dominate — average arithmetic misleading; plan hybrid (news feed overlap).
-
Media: assume object storage like S3; not counted in tweet row size beyond URL.
3. API Design
POST /v1/tweets
Authorization: Bearer <token>
Body: { "text": "hello world", "replyToTweetId": null, "mediaIds": ["m1"] }
-> 201 { "tweetId": "t_abc", "createdAt": "..." }GET /v1/timeline/home?cursor=...
-> 200 { "tweets": [ { "tweetId": "...", "author": {...} } ], "nextCursor": "..." }POST /v1/users/{id}/follow
-> 204GET /v1/users/{userId}/tweets?cursor=...
-> 200 { "tweets": [...] }POST /v1/tweets/{tweetId}/retweet
-> 201 { "retweetId": "t_rt" }Internal gRPC: FanOutTweet, MergeTimelines, FetchTweetEntities.
4. Data Model
Tweet
tweet_id(Snowflake/ULID),author_id,text,created_at,reply_to,retweet_of,deleted_atnullable.
User
- Profile fields,
followers_countdenormalized updated async.
Timeline (materialized)
- Keyed by
viewer_id→ ordered list oftweet_id— Redis sorted sets or Cassandra wide partition.
Graph
followsedges (follower,followee,ts) in PostgreSQL sharded or JanusGraph if extreme — interviews often stop at sharded SQL.
Why Cassandra for timelines
- Write-heavy append by partition; TTL trim — vs DynamoDB similar trade (news feed section parallels).
Indexes
(author_id, created_at DESC)for profile timeline.- Search offloaded to Elasticsearch with
tweet_idas document id.
Sample tweet row
| tweet_id | author_id | text | created_at |
|---|---|---|---|
| 189272… | u42 | hello | 2026-04-29T12:00:00Z |
5. High-Level Architecture
6. Component Deep-Dives
ID generation
- Snowflake-style 64-bit IDs from dedicated cluster vs UUID — Snowflake time-sortable and shorter in URLs; needs worker coordination (ID generation).
Fan-out worker
- Consumes
NewTweetevents; iffollower_count < threshold, fetch follower ids batched from graph shard; write timeline rows async. - Above threshold: mark tweet celebrity; skip push or partial push to active followers only.
Timeline read
- Merge home timeline ids from Cassandra with celebrity pull list queried separately — union + sort by
tweet_idtimestamp portion if Snowflake.
Retweets
- Either store as new tweet with
retweet_ofor separate edge table; timeline insertion analogous to new tweet with smaller fan-out often.
Caching
- Hot user timelines in Redis cluster with consistent hashing; stale-while-revalidate.
Search
- Async indexing pipeline Kafka → Elasticsearch; not on critical post path.
Why Kafka over RabbitMQ for fan-out
- Throughput and replay; operational complexity acknowledged (message queues).
7. Bottlenecks & Mitigations
| Bottleneck | Effect | Mitigation |
|---|---|---|
| Celebrity fan-out | Millions of writes | Hybrid pull, fan-out to active devices via separate channel |
| Cassandra partition hot spot | One viewer timeline huge | Cap stored ids; trim older; archive cold |
| Graph DB fan-out query slow | Post latency | Precomputed follower buckets; cache follower list shards |
| Search index lag | Tweet missing in search | Accept eventual; show on profile regardless |
| Rate abuse | Spam tweets | Distributed rate limiter per user/IP |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Cassandra timelines | Pure Postgres | Horizontal write scale at follower fan-out |
| Kafka fan-out pipeline | Synchronous SQL triggers | Decouple peak spikes |
| Snowflake IDs | ULID | Millisecond precision + embedded worker id common pattern |
| Elasticsearch | Postgres FTS | Scale and relevance tuning |
| Celebrity hybrid | Push only | Bounded write amplification |
| Redis overlay | Memcached | Eviction + sorted structures for merge |
9. Follow-ups (interviewer drill-downs)
- Delete tweet / moderation: Tombstone in tweet store; propagate delete events to timelines via compact topic (expensive) vs lazy filter at read (cheaper CPU per read).
- Reply threading: Store adjacency list
tweet_id -> parent; depth queries bounded. - Quote tweets: New tweet + pointer similar to retweet model.
- Multi-region: Active-active timeline writes conflict rare if user pinned to region—use CRDT only if necessary.
- Trending topics: Separate counting cluster (Storm/Flink) — compare typeahead popularity pipeline.
Last updated on
Spotted something unclear or wrong on this page?