THN Interview Prep

Design News Feed

1. Requirements

Functional

  • Users follow other users or topics; home feed shows a ranked chronology (or ML-ranked mix) of posts from followed sources.
  • Posts may include text, links, optional media references stored elsewhere.
  • Pagination: cursor-based infinite scroll; stable ordering under concurrent inserts.
  • Post creation visible to followers subject to privacy rules (public, followers-only).
  • Optional reactions/comments — touch only as fan-out volume driver in deep-dives.

Non-Functional

  • Scale: 200M DAU; average 200 follows per user (heavy tail); 100M new posts/day.
  • Read: home feed load 50B impressions/day equivalent if each user loads feed 250 times/day — design for 50k–200k RPS aggregate read at CDN/API after aggregation (many satisfied from cache).
  • Write: 100M posts / 86,400 ~ 1,160/s; peaks 10k/s during events.
  • Latency: first screen p99 under 300 ms warm; cold start under 600 ms.
  • Consistency: eventual fan-out acceptable for followers (seconds delay OK for many products); celebrities may use pull model hybrid.
  • Availability: 99.95% read path.

Out of Scope

  • Full ads ranking and auction (assume organic ranking only or stub).
  • Live video streaming timeline integration.
  • Graph search (“friends of friends”) beyond indexed follower lists.
  • Legal content moderation pipeline depth.

2. Back-of-Envelope Estimations

Posts: 100M/day; average 300 bytes metadata + pointer to media = negligible per row; hot storage 100M * 500 B ~ 50 GB/day raw in Cassandra-like wide-column store before replication.

Fan-out: If average 300 followers per poster (not realistic uniformly—use two-tier model):

  • Normal users: 300 fan-out writes per post * 100M posts impractical if literal — actual average followers much lower for bulk of users; Pareto: 1% users have huge followers.

Average fan-out engineering estimate: 100M posts * 50 average recipient writes = 5B fan-out units/day if push model ~ 57,870/s sustained — requires batching and async (message queues).

Storage for timelines: Store only post ids in feed shards—assume 8 B id * 500 posts visible buffer * 200M users — not all materialized; push stores per follower queue capped (e.g., last 1000 ids).

Read path: each feed fetch returns ~20 posts — merge k sources in read-time or precomputed.


3. API Design

GET /v1/feed/home?cursor=eyJpZCI6IjEyMyJ9&limit=20
Authorization: Bearer <token>
-> 200 {
  "items": [
    { "postId": "p1", "authorId": "a9", "snippet": "...", "rankScore": 0.92, "createdAt": "2026-04-29T12:00:00Z" }
  ],
  "nextCursor": "..."
}
POST /v1/posts
Body: { "text": "hello", "visibility": "followers" }
-> 201 { "postId": "p_new", "fanOutStatus": "accepted" }
POST /v1/users/{id}/follow
-> 204
GET /v1/users/{id}/feed  // public profile timeline
-> 200 { "items": [...] }

gRPC internal: GetRankedFeed(userId, cursor), PublishPost(post) for fan-out workers.


4. Data Model

Post

  • post_id (ULID), author_id, body_ref or inline text, created_at, visibility.

Follow graph

  • Edge table: follower_id, followee_id, created_atPostgres or social graph store (for billions of edges consider sharded SQL or FlockDB/TigerGraph class — interview often stays Postgres + scale story).

Home timeline (push model)

  • user_timeline partition key = viewer_id, sort key = created_at desc, value = post_idDynamoDB or Cassandra excellent for wide partition append with TTL trim.

Why not pure Postgres for timelines at Twitter scale

  • Hot partitions on celebrity inbox — necessitates hybrid pull/push; Postgres okay for MVP with aggressive partitioning strategy.

Indexes

  • GSI on author_id, created_at for profile feeds.
  • Secondary: topics / hashtags if product requires.

Sample Cassandra row

viewer_idtimeline_tspost_id
u9002026-04-29T12:00:01Zp555

5. High-Level Architecture

Loading diagram…

See caching for Redis timeline cache; consistent hashing for shard placement.


6. Component Deep-Dives

Fan-out on write (push)

  • On post, query follower ids (or iterate precomputed follower shards); enqueue batches to Kafka keyed by follower_id segment so consumers update that user's timeline partition without cross-talk.
  • Why vs pull-only: read path O(1) for follower lists per user at scale; write cost shifts to post time — acceptable for typical users, bad for celebrities.

Hybrid: fan-out on write for normal, fan-out on read for celebrities

  • Threshold F > 10k followers: skip push; merge celebrity posts at read time from dedicated recent_posts_by_celeb cache (caching).
  • Twitter-style optimization widely cited in interviews.

Ranked feed

  • Chronological: merge-sort streams by timestamp — simple.
  • ML rank: online features from Redis + offline batch features from Snowflake fed to TensorFlow Serving; rerank top-K candidates gathered from timeline ids.

Read path

  1. Redis cache key feed:{user}:{cursorBucket}.
  2. Miss: fetch timeline post ids from Cassandra (last N).
  3. Hydrate posts from post store by id batch (IN query).
  4. Apply ranking service; return.

Pagination

  • Cursor encodes last (created_at, post_id) tuple for stable tie-break; avoid offset.

Graph service

  • Postgres with adjacency list OK up to millions of edges per shard; for larger, specialized stores or read replicas with denormalized counts.

7. Bottlenecks & Mitigations

BottleneckScenarioMitigation
Celebrity postMillions of fan-out writesHybrid pull + merge at read; dedicated queue per celeb for partial materialization
Hot timeline partitionSingle user read heavyRedis edge cache; CDN not applicable for personalized feeds
Kafka lag during spikeMany postsScale consumers; prioritize tier-1 users if unethical—better: back-pressure producers
Stale rank featuresModel driftFeature TTL + fallback chronological
Graph join storms“Who follows me”Materialized counts; avoid COUNT(*) live

8. Tradeoffs

DecisionAlternativeWhy we picked
Cassandra timelinesDynamoDBWide-partition write throughput story; similar trade space
Kafka fan-out pipelineSQS per shardKafka replay for failed fan-out segments
Hybrid push/pullPush onlyCelebrity explosion containment
Redis read-throughMemcachedTimeline eviction policies + sorted sets for merge
ULID post idsSnowflakeSortable roughly by time in distributed ids (ID generation)
ML rerankChronological onlyProduct engagement vs complexity

9. Follow-ups (interviewer drill-downs)

  • Global consistency when user posts then immediately loads feed? Read-your-writes via route same-region writer + include recent self writes from session buffer client-side.
  • Delete post? Tombstone post id; fan-out delete messages or lazy filter at read.
  • Block users? Filter in rank service with blocklist cache per viewer.
  • 100x traffic? Shard Kafka; autoscale stateless API; Cassandra cluster resize planning.
  • Compare with Twitter breakdown for narrative alignment.

Last updated on

Spotted something unclear or wrong on this page?

On this page