Design WhatsApp (Messaging)
1. Requirements
Functional
- 1:1 and group chat with delivery receipts (single/double check semantics product-dependent), read receipts optional.
- Multimedia messages: images, voice notes, documents uploaded through encrypted references.
- Online/presence and typing indicators with low staleness.
- Push notifications when app backgrounded using FCM/APNs with encrypted payload limitations handled by wake-and-fetch pattern.
- End-to-end encryption (E2EE): server must not read plaintext message bodies — Signal Protocol (Double Ratchet) widely referenced in interviews.
Non-Functional
- Scale: 2B+ users class globally; peak message rate in tens of millions per second aggregate across regions (order-of-magnitude teaching).
- Latency: sub-second delivery expectation on same continent; typing indicator <200 ms server RTT budget.
- Availability: 99.99% message acceptance; partitioned tolerance across regions with queue-based eventual delivery.
- Consistency: per-chat ordering stronger than global; CAP favors availability + partition tolerance with ordering fixes client-side when rare anomalies occur.
Out of Scope
- Full Double Ratchet cryptographic specification step-by-step (state high level).
- Payments and commerce rail depth.
- Server-side content moderation of E2EE bodies (metadata-only strategies).
- Large group video conferencing architecture (focus async messaging).
2. Back-of-Envelope Estimations
Assume 2B monthly actives, 100B messages/day global ballpark for teaching scale.
-
Messages/s: 100B / 86,400 ~ 1.16M/s average; peak factor 3–5x → 3–6M/s — requires massive sharded messaging plane.
-
Average message envelope 200 B ciphertext + routing metadata — metadata plane 200 MB/s average without media; media separate object path dominates bandwidth.
-
Storage: Text-only retention policies vary; 30-day server-side vs delete-on-delivery — WhatsApp-like minimal server retention shifts storage to clients — server stores undelivered queue only briefly.
-
Presence updates: If each user heartbeat every 30s → 2B / 30 ~ 66M updates/min worst case — must aggregate/coalesce; realistic push-based presence changes only on app foreground transitions → far lower.
3. API Design
Conceptual binary protocol (WhatsApp uses custom); REST shown for clarity.
POST /v1/sessions
Body: { "devicePublicKey": "...", "identityProof": "..." }
-> 201 { "sessionToken": "...", "wsEndpoint": "wss://edge.example/v1/ws" }POST /v1/messages
Authorization: Bearer <token>
Body: {
"chatId": "c_abc",
"clientMsgId": "uuid",
"ciphertext": "<base64>",
"type": "text"
}
-> 202 { "serverMsgId": "sm_123", "timestamp": "..." }GET /v1/chats/{chatId}/history?since=cursor
-> 200 { "messages": [ ... ] } // ciphertext blobs if server stores temporarilyWebSocket primary for delivery:
Server -> Client: { "event": "message", "chatId": "...", "ciphertext": "..." }gRPC streaming alternative for mobile efficiency.
4. Data Model
Chat session
chat_id, type (direct|group), participants many-to-many table.
Message record (server-visible fields only)
server_msg_id,chat_id,sender_device_id,ciphertext_blob_refor inline blob in Cassandra for short retention,timestamp,delivery_state.
Device
user_id,device_id,registration_idfor Signal protocol,push_tokenfor FCM/APNs.
Why Cassandra / Scylla for message routing
- Horizontal partition by
chat_idclusters messages; time-series friendly — vs DynamoDB similar.
Why not Postgres for hot path
- Single-row writes per message at million/s scale exceed vertical limits — sharded SQL possible with expertise (consistent hashing on
chat_id).
Object storage
- Media ciphertext uploaded presigned S3; message row holds pointer + MAC metadata.
Sample row
| chat_id | server_msg_id | sender | ciphertext_handle | ts |
|---|---|---|---|---|
| c42 | 991823... | d7 | s3://blob | ... |
5. High-Level Architecture
Cross-region: message queues for eventual replica sync; presence via caching layer.
6. Component Deep-Dives
Connection layer
- WebSocket sticky sessions on L4/L7 load balancer — reconnect storms handled with jittered backoff client-side.
- MQTT alternative for IoT-friendly semantics — WhatsApp-class historically custom frames over TLS.
Message routing
- Hash
chat_id→ shard Kafka topic partition or directly to Cassandra writer pod pool — single writer per chat avoids ordering races within chat.
E2EE responsibilities
- Server routes opaque ciphertext; key exchange via X3DH at session setup — stored device keys in low-read KV with privacy-preserving directory (ID generation for device ids).
Delivery receipts
- Small ack messages piggyback on same socket; offline acks queue until reconnect.
Push notifications
- No plaintext in FCM data payload if strict — send encrypted envelope id; client wakes, pulls full message via WebSocket.
Group chats
- Sender keys broadcast via pairwise channels — operational complexity; scale group size limits.
Media
- Client uploads ciphertext to S3; separate integrity verification — parallel to Instagram infra without server-side decode.
Why Kafka in messaging
- Cross-datacenter replication and buffering during partial outages — alternative RabbitMQ shovel less common at this throughput.
7. Bottlenecks & Mitigations
| Bottleneck | Symptom | Mitigation |
|---|---|---|
| Hot chat (world cup group) | Partition hot shard | Split rarely; fan-in careful sequencing single writer |
| Presence stampede | Redis overload | Sample presence; aggregate friend visibility bucketing |
| Reconnect storm | Gateway CPU | Rate limit handshakes; regional DNS shuffle |
| Push token invalid | Wasted provider quota | Prune on bounce webhook |
| Cross-region latency | Higher delivery delay | User affinity to nearest DC where legally possible |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Cassandra message store | Sharded MySQL | Write throughput path clarity |
| WebSocket long-lived | HTTP polling | Battery and latency unacceptable |
| Minimal server retention | Full history cloud | E2EE trust + cost model |
| Kafka cross-region | DB replication only | Decouple failures + replay |
| E2EE default | Server readable | Product privacy stance |
| Binary custom protocol | JSON REST | Bandwidth and parse CPU on mobile |
9. Follow-ups (interviewer drill-downs)
- Offline-first: Client SQLite queue; merge conflicts rare in messaging vs CRDT docs.
- Multi-device: Each device has separate ratchet state; message fan-out to all user devices.
- Spam without plaintext? Rate limits by metadata (distributed rate limiter), behavioral signals, user reports.
- Lawful intercept tension: Product/legal outside engineering scope but acknowledge metadata availability.
- LLD: Wire format versioning and backwards compatibility during rolling upgrades.
Last updated on
Spotted something unclear or wrong on this page?