Design News Feed
1. Requirements
Functional
- Users follow other users or topics; home feed shows a ranked chronology (or ML-ranked mix) of posts from followed sources.
- Posts may include text, links, optional media references stored elsewhere.
- Pagination: cursor-based infinite scroll; stable ordering under concurrent inserts.
- Post creation visible to followers subject to privacy rules (public, followers-only).
- Optional reactions/comments — touch only as fan-out volume driver in deep-dives.
Non-Functional
- Scale: 200M DAU; average 200 follows per user (heavy tail); 100M new posts/day.
- Read: home feed load 50B impressions/day equivalent if each user loads feed 250 times/day — design for 50k–200k RPS aggregate read at CDN/API after aggregation (many satisfied from cache).
- Write: 100M posts / 86,400 ~ 1,160/s; peaks 10k/s during events.
- Latency: first screen p99 under 300 ms warm; cold start under 600 ms.
- Consistency: eventual fan-out acceptable for followers (seconds delay OK for many products); celebrities may use pull model hybrid.
- Availability: 99.95% read path.
Out of Scope
- Full ads ranking and auction (assume organic ranking only or stub).
- Live video streaming timeline integration.
- Graph search (“friends of friends”) beyond indexed follower lists.
- Legal content moderation pipeline depth.
2. Back-of-Envelope Estimations
Posts: 100M/day; average 300 bytes metadata + pointer to media = negligible per row; hot storage 100M * 500 B ~ 50 GB/day raw in Cassandra-like wide-column store before replication.
Fan-out: If average 300 followers per poster (not realistic uniformly—use two-tier model):
- Normal users: 300 fan-out writes per post * 100M posts impractical if literal — actual average followers much lower for bulk of users; Pareto: 1% users have huge followers.
Average fan-out engineering estimate: 100M posts * 50 average recipient writes = 5B fan-out units/day if push model ~ 57,870/s sustained — requires batching and async (message queues).
Storage for timelines: Store only post ids in feed shards—assume 8 B id * 500 posts visible buffer * 200M users — not all materialized; push stores per follower queue capped (e.g., last 1000 ids).
Read path: each feed fetch returns ~20 posts — merge k sources in read-time or precomputed.
3. API Design
GET /v1/feed/home?cursor=eyJpZCI6IjEyMyJ9&limit=20
Authorization: Bearer <token>
-> 200 {
"items": [
{ "postId": "p1", "authorId": "a9", "snippet": "...", "rankScore": 0.92, "createdAt": "2026-04-29T12:00:00Z" }
],
"nextCursor": "..."
}POST /v1/posts
Body: { "text": "hello", "visibility": "followers" }
-> 201 { "postId": "p_new", "fanOutStatus": "accepted" }POST /v1/users/{id}/follow
-> 204GET /v1/users/{id}/feed // public profile timeline
-> 200 { "items": [...] }gRPC internal: GetRankedFeed(userId, cursor), PublishPost(post) for fan-out workers.
4. Data Model
Post
post_id(ULID),author_id,body_refor inline text,created_at,visibility.
Follow graph
- Edge table:
follower_id,followee_id,created_at— Postgres or social graph store (for billions of edges consider sharded SQL or FlockDB/TigerGraph class — interview often stays Postgres + scale story).
Home timeline (push model)
user_timelinepartition key =viewer_id, sort key =created_atdesc, value =post_id— DynamoDB or Cassandra excellent for wide partition append with TTL trim.
Why not pure Postgres for timelines at Twitter scale
- Hot partitions on celebrity inbox — necessitates hybrid pull/push; Postgres okay for MVP with aggressive partitioning strategy.
Indexes
- GSI on
author_id, created_atfor profile feeds. - Secondary: topics / hashtags if product requires.
Sample Cassandra row
| viewer_id | timeline_ts | post_id |
|---|---|---|
| u900 | 2026-04-29T12:00:01Z | p555 |
5. High-Level Architecture
See caching for Redis timeline cache; consistent hashing for shard placement.
6. Component Deep-Dives
Fan-out on write (push)
- On post, query follower ids (or iterate precomputed follower shards); enqueue batches to Kafka keyed by
follower_idsegment so consumers update that user's timeline partition without cross-talk. - Why vs pull-only: read path O(1) for follower lists per user at scale; write cost shifts to post time — acceptable for typical users, bad for celebrities.
Hybrid: fan-out on write for normal, fan-out on read for celebrities
- Threshold
F > 10kfollowers: skip push; merge celebrity posts at read time from dedicated recent_posts_by_celeb cache (caching). - Twitter-style optimization widely cited in interviews.
Ranked feed
- Chronological: merge-sort streams by timestamp — simple.
- ML rank: online features from Redis + offline batch features from Snowflake fed to TensorFlow Serving; rerank top-K candidates gathered from timeline ids.
Read path
- Redis cache key
feed:{user}:{cursorBucket}. - Miss: fetch timeline post ids from Cassandra (last N).
- Hydrate posts from post store by id batch (
INquery). - Apply ranking service; return.
Pagination
- Cursor encodes last
(created_at, post_id)tuple for stable tie-break; avoid offset.
Graph service
- Postgres with adjacency list OK up to millions of edges per shard; for larger, specialized stores or read replicas with denormalized counts.
7. Bottlenecks & Mitigations
| Bottleneck | Scenario | Mitigation |
|---|---|---|
| Celebrity post | Millions of fan-out writes | Hybrid pull + merge at read; dedicated queue per celeb for partial materialization |
| Hot timeline partition | Single user read heavy | Redis edge cache; CDN not applicable for personalized feeds |
| Kafka lag during spike | Many posts | Scale consumers; prioritize tier-1 users if unethical—better: back-pressure producers |
| Stale rank features | Model drift | Feature TTL + fallback chronological |
| Graph join storms | “Who follows me” | Materialized counts; avoid COUNT(*) live |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Cassandra timelines | DynamoDB | Wide-partition write throughput story; similar trade space |
| Kafka fan-out pipeline | SQS per shard | Kafka replay for failed fan-out segments |
| Hybrid push/pull | Push only | Celebrity explosion containment |
| Redis read-through | Memcached | Timeline eviction policies + sorted sets for merge |
| ULID post ids | Snowflake | Sortable roughly by time in distributed ids (ID generation) |
| ML rerank | Chronological only | Product engagement vs complexity |
9. Follow-ups (interviewer drill-downs)
- Global consistency when user posts then immediately loads feed? Read-your-writes via route same-region writer + include recent self writes from session buffer client-side.
- Delete post? Tombstone post id; fan-out delete messages or lazy filter at read.
- Block users? Filter in rank service with blocklist cache per viewer.
- 100x traffic? Shard Kafka; autoscale stateless API; Cassandra cluster resize planning.
- Compare with Twitter breakdown for narrative alignment.
Last updated on
Spotted something unclear or wrong on this page?