HLD Template

Every file in system-design/hld/ MUST follow this 9-section shape.

Visibility: section 5. High-Level Architecture must include at least one ```mermaid diagram (flowchart or similar) so the design is scannable without reading prose first.

# Design <System Name>

## 1. Requirements

### Functional
- Bullet list. What can users do? Be explicit about scope.

### Non-Functional
- Scale (DAU, QPS, storage).
- Latency targets (p50 / p99 read / write).
- Availability (e.g., 99.99%).
- Consistency (strong / eventual / read-your-writes).
- Durability.

### Out of Scope
- Things you intentionally won't design (auth, billing, ...).

## 2. Back-of-Envelope Estimations
- DAU -> QPS (read & write).
- Storage / day, year, 5yr.
- Bandwidth in & out.
- Cache size (working set rule of thumb: 80/20).
- Show the math, not just the number.

## 3. API Design
REST / gRPC. List endpoints with method, path, request, response, error codes.

```http
POST /v1/<resource>
Body: { ... }
-> 201 { id, ... }

4. Data Model

Entities + fields + types.
SQL vs NoSQL choice with why.
Indexes, partition key, sort key.
Sample row.

5. High-Level Architecture

Mermaid diagram. Show: clients -> CDN/LB -> API gateway -> services -> caches -> DBs / queues / blob store.

Loading diagram…

6. Component Deep-Dives

For each non-trivial component:

Responsibilities.
Data structures / algorithms inside.
Why this tech (Kafka vs SQS, Redis vs Memcached, Postgres vs DynamoDB).
Failure modes + handling.

Common ones to deep-dive: ID generation, write path, read path, fan-out (push/pull/hybrid), sharding strategy, cache strategy, hot-key handling.

7. Bottlenecks & Mitigations

Where will it break first as scale 10x?
Hot keys, celebrity problem, thundering herd, cache stampede, head-of-line blocking.
Mitigations: consistent hashing, replication, sharding, request coalescing, backpressure.

8. Tradeoffs

Table: choice -> alternative -> why we picked this.

Decision	Alternative	Why we picked
Push fan-out	Pull	Latency for active users
DynamoDB	Postgres	Horizontal scale, predictable p99

9. Follow-ups (interviewer drill-downs)

"What if traffic 100x?"
"How do you ensure exactly-once?"
"How do you migrate the data model?"
"Multi-region active-active?"
"Cost optimization?"