Design Twitter (Microblogging)

1. Requirements

Functional

Users register profile and post short messages (“tweets”) with optional media attachments stored externally.
Follow other users; build personalized home timeline mixing tweets from followed accounts with ranking signals (interview may scope chronological MVP).
Like, retweet, reply forming threads; replies expand conversation trees.
Search tweets (often delegated to separate Lucene/ES cluster scope).
Notifications for mentions and followers (tie to notification system design).

Non-Functional

Scale: 500M+ users class problem; thousands of tweets per second global; hundreds of thousands timeline reads per second aggregate.
Latency: post visible to self immediately; fan-out to followers under few seconds for normal users; strict read-your-writes for author.
Availability: 99.95% for reads; tolerable brief inconsistency for rare edge cases in distributed systems.
Consistency: hybrid timeline model — eventual fan-out with synchronization hooks for critical UX paths.

Out of Scope

Full ads marketplace and billing.
Spaces/audio rooms engineering depth.
Full-text indexing internals beyond inverted index mention.
Federation / ActivityPub.

2. Back-of-Envelope Estimations

Assume 250M DAU; 150M tweets/day average historical ballpark for teaching.

Write: 150M / 86,400 ~ 1,736/s; peak sports/events 20k/s.
Read: If each user refreshes home timeline 30x/day → 7.5B timeline requests/day ~ 86,500/s worldwide; many cached — origin tier sees fraction.
Storage: tweet metadata ~200 B + text up to 280 chars UTF-8 ~ 1 KB avg → 150 GB/day raw; multi-year multi-PB with media pointers excluded.
Fan-out writes (push model): median followers low (~ hundreds maximum mass); celebrity outliers dominate — average arithmetic misleading; plan hybrid (news feed overlap).
Media: assume object storage like S3; not counted in tweet row size beyond URL.

3. API Design

POST /v1/tweets
Authorization: Bearer <token>
Body: { "text": "hello world", "replyToTweetId": null, "mediaIds": ["m1"] }
-> 201 { "tweetId": "t_abc", "createdAt": "..." }

GET /v1/timeline/home?cursor=...
-> 200 { "tweets": [ { "tweetId": "...", "author": {...} } ], "nextCursor": "..." }

POST /v1/users/{id}/follow
-> 204

GET /v1/users/{userId}/tweets?cursor=...
-> 200 { "tweets": [...] }

POST /v1/tweets/{tweetId}/retweet
-> 201 { "retweetId": "t_rt" }

Internal gRPC: FanOutTweet, MergeTimelines, FetchTweetEntities.

4. Data Model

Tweet

tweet_id (Snowflake/ULID), author_id, text, created_at, reply_to, retweet_of, deleted_at nullable.

User

Profile fields, followers_count denormalized updated async.

Timeline (materialized)

Keyed by viewer_id → ordered list of tweet_id — Redis sorted sets or Cassandra wide partition.

Graph

follows edges (follower, followee, ts) in PostgreSQL sharded or JanusGraph if extreme — interviews often stop at sharded SQL.

Why Cassandra for timelines

Write-heavy append by partition; TTL trim — vs DynamoDB similar trade (news feed section parallels).

Indexes

(author_id, created_at DESC) for profile timeline.
Search offloaded to Elasticsearch with tweet_id as document id.

Sample tweet row

tweet_id	author_id	text	created_at
189272…	u42	hello	2026-04-29T12:00:00Z

5. High-Level Architecture

Loading diagram…

6. Component Deep-Dives

ID generation

Snowflake-style 64-bit IDs from dedicated cluster vs UUID — Snowflake time-sortable and shorter in URLs; needs worker coordination (ID generation).

Fan-out worker

Consumes NewTweet events; if follower_count < threshold, fetch follower ids batched from graph shard; write timeline rows async.
Above threshold: mark tweet celebrity; skip push or partial push to active followers only.

Timeline read

Merge home timeline ids from Cassandra with celebrity pull list queried separately — union + sort by tweet_id timestamp portion if Snowflake.

Retweets

Either store as new tweet with retweet_of or separate edge table; timeline insertion analogous to new tweet with smaller fan-out often.

Caching

Hot user timelines in Redis cluster with consistent hashing; stale-while-revalidate.

Search

Async indexing pipeline Kafka → Elasticsearch; not on critical post path.

Why Kafka over RabbitMQ for fan-out

Throughput and replay; operational complexity acknowledged (message queues).

7. Bottlenecks & Mitigations

Bottleneck	Effect	Mitigation
Celebrity fan-out	Millions of writes	Hybrid pull, fan-out to active devices via separate channel
Cassandra partition hot spot	One viewer timeline huge	Cap stored ids; trim older; archive cold
Graph DB fan-out query slow	Post latency	Precomputed follower buckets; cache follower list shards
Search index lag	Tweet missing in search	Accept eventual; show on profile regardless
Rate abuse	Spam tweets	Distributed rate limiter per user/IP

8. Tradeoffs

Decision	Alternative	Why we picked
Cassandra timelines	Pure Postgres	Horizontal write scale at follower fan-out
Kafka fan-out pipeline	Synchronous SQL triggers	Decouple peak spikes
Snowflake IDs	ULID	Millisecond precision + embedded worker id common pattern
Elasticsearch	Postgres FTS	Scale and relevance tuning
Celebrity hybrid	Push only	Bounded write amplification
Redis overlay	Memcached	Eviction + sorted structures for merge

9. Follow-ups (interviewer drill-downs)

Delete tweet / moderation: Tombstone in tweet store; propagate delete events to timelines via compact topic (expensive) vs lazy filter at read (cheaper CPU per read).
Reply threading: Store adjacency list tweet_id -> parent; depth queries bounded.
Quote tweets: New tweet + pointer similar to retweet model.
Multi-region: Active-active timeline writes conflict rare if user pinned to region—use CRDT only if necessary.
Trending topics: Separate counting cluster (Storm/Flink) — compare typeahead popularity pipeline.

On this page