Design WhatsApp (Messaging)

1. Requirements

Functional

1:1 and group chat with delivery receipts (single/double check semantics product-dependent), read receipts optional.
Multimedia messages: images, voice notes, documents uploaded through encrypted references.
Online/presence and typing indicators with low staleness.
Push notifications when app backgrounded using FCM/APNs with encrypted payload limitations handled by wake-and-fetch pattern.
End-to-end encryption (E2EE): server must not read plaintext message bodies — Signal Protocol (Double Ratchet) widely referenced in interviews.

Non-Functional

Scale: 2B+ users class globally; peak message rate in tens of millions per second aggregate across regions (order-of-magnitude teaching).
Latency: sub-second delivery expectation on same continent; typing indicator <200 ms server RTT budget.
Availability: 99.99% message acceptance; partitioned tolerance across regions with queue-based eventual delivery.
Consistency: per-chat ordering stronger than global; CAP favors availability + partition tolerance with ordering fixes client-side when rare anomalies occur.

Out of Scope

Full Double Ratchet cryptographic specification step-by-step (state high level).
Payments and commerce rail depth.
Server-side content moderation of E2EE bodies (metadata-only strategies).
Large group video conferencing architecture (focus async messaging).

2. Back-of-Envelope Estimations

Assume 2B monthly actives, 100B messages/day global ballpark for teaching scale.

Messages/s: 100B / 86,400 ~ 1.16M/s average; peak factor 3–5x → 3–6M/s — requires massive sharded messaging plane.
Average message envelope 200 B ciphertext + routing metadata — metadata plane 200 MB/s average without media; media separate object path dominates bandwidth.
Storage: Text-only retention policies vary; 30-day server-side vs delete-on-delivery — WhatsApp-like minimal server retention shifts storage to clients — server stores undelivered queue only briefly.
Presence updates: If each user heartbeat every 30s → 2B / 30 ~ 66M updates/min worst case — must aggregate/coalesce; realistic push-based presence changes only on app foreground transitions → far lower.

3. API Design

Conceptual binary protocol (WhatsApp uses custom); REST shown for clarity.

POST /v1/sessions
Body: { "devicePublicKey": "...", "identityProof": "..." }
-> 201 { "sessionToken": "...", "wsEndpoint": "wss://edge.example/v1/ws" }

POST /v1/messages
Authorization: Bearer <token>
Body: {
  "chatId": "c_abc",
  "clientMsgId": "uuid",
  "ciphertext": "<base64>",
  "type": "text"
}
-> 202 { "serverMsgId": "sm_123", "timestamp": "..." }

GET /v1/chats/{chatId}/history?since=cursor
-> 200 { "messages": [ ... ] }  // ciphertext blobs if server stores temporarily

WebSocket primary for delivery:

Server -> Client: { "event": "message", "chatId": "...", "ciphertext": "..." }

gRPC streaming alternative for mobile efficiency.

4. Data Model

Chat session

chat_id, type (direct|group), participants many-to-many table.

Message record (server-visible fields only)

server_msg_id, chat_id, sender_device_id, ciphertext_blob_ref or inline blob in Cassandra for short retention, timestamp, delivery_state.

Device

user_id, device_id, registration_id for Signal protocol, push_token for FCM/APNs.

Why Cassandra / Scylla for message routing

Horizontal partition by chat_id clusters messages; time-series friendly — vs DynamoDB similar.

Why not Postgres for hot path

Single-row writes per message at million/s scale exceed vertical limits — sharded SQL possible with expertise (consistent hashing on chat_id).

Object storage

Media ciphertext uploaded presigned S3; message row holds pointer + MAC metadata.

Sample row

chat_id	server_msg_id	sender	ciphertext_handle	ts
c42	991823...	d7	s3://blob	...

5. High-Level Architecture

Loading diagram…

Cross-region: message queues for eventual replica sync; presence via caching layer.

6. Component Deep-Dives

Connection layer

WebSocket sticky sessions on L4/L7 load balancer — reconnect storms handled with jittered backoff client-side.
MQTT alternative for IoT-friendly semantics — WhatsApp-class historically custom frames over TLS.

Message routing

Hash chat_id → shard Kafka topic partition or directly to Cassandra writer pod pool — single writer per chat avoids ordering races within chat.

E2EE responsibilities

Server routes opaque ciphertext; key exchange via X3DH at session setup — stored device keys in low-read KV with privacy-preserving directory (ID generation for device ids).

Delivery receipts

Small ack messages piggyback on same socket; offline acks queue until reconnect.

Push notifications

No plaintext in FCM data payload if strict — send encrypted envelope id; client wakes, pulls full message via WebSocket.

Group chats

Sender keys broadcast via pairwise channels — operational complexity; scale group size limits.

Media

Client uploads ciphertext to S3; separate integrity verification — parallel to Instagram infra without server-side decode.

Why Kafka in messaging

Cross-datacenter replication and buffering during partial outages — alternative RabbitMQ shovel less common at this throughput.

7. Bottlenecks & Mitigations

Bottleneck	Symptom	Mitigation
Hot chat (world cup group)	Partition hot shard	Split rarely; fan-in careful sequencing single writer
Presence stampede	Redis overload	Sample presence; aggregate friend visibility bucketing
Reconnect storm	Gateway CPU	Rate limit handshakes; regional DNS shuffle
Push token invalid	Wasted provider quota	Prune on bounce webhook
Cross-region latency	Higher delivery delay	User affinity to nearest DC where legally possible

8. Tradeoffs

Decision	Alternative	Why we picked
Cassandra message store	Sharded MySQL	Write throughput path clarity
WebSocket long-lived	HTTP polling	Battery and latency unacceptable
Minimal server retention	Full history cloud	E2EE trust + cost model
Kafka cross-region	DB replication only	Decouple failures + replay
E2EE default	Server readable	Product privacy stance
Binary custom protocol	JSON REST	Bandwidth and parse CPU on mobile

9. Follow-ups (interviewer drill-downs)

Offline-first: Client SQLite queue; merge conflicts rare in messaging vs CRDT docs.
Multi-device: Each device has separate ratchet state; message fan-out to all user devices.
Spam without plaintext? Rate limits by metadata (distributed rate limiter), behavioral signals, user reports.
Lawful intercept tension: Product/legal outside engineering scope but acknowledge metadata availability.
LLD: Wire format versioning and backwards compatibility during rolling upgrades.

On this page