ID Generation
Distributed systems need unique, often sortable identifiers without a single global mutex. Interviewers probe whether you understand collision probability, clock skew, embeddability in URLs, and ordering guarantees.
Common strategies
UUID (v4 random)
- Pros: trivial, no coordination, huge space.
- Cons: not time-sortable; long as string primary keys in some DBs; index locality worse than monotonic ids.
Snowflake-style 64-bit integer
- Structure (conceptual): timestamp | datacenter/worker | sequence within millisecond.
- Pros: roughly time-ordered, compact, fits in
BIGINT, good for indexes and logs. - Cons: needs worker id assignment; clock skew handling (NTP); sequence exhaustion if QPS per worker > sequence bits allow.
Database sequence / auto-increment
- Pros: strong ordering, transactional simplicity.
- Cons: single-writer bottleneck at extreme scale; not ideal for multi-region active-active without ranges.
Ticket / segment servers
- Pre-allocate ranges of ids to services so generation stays local until range exhaustion — hybrid of DB safety and distributed throughput.
Interview talking points
- Ordering: Do we need strict total order or “good enough” time order for UX?
- Collision: Birthday paradox irrelevant for 122-bit UUID; matters if you shorten IDs (base62).
- Hot spots: Monotonic ids can create write hot shards in some DBs — mitigate with hash prefix sharding or ULID/tradeoffs.
- Clock skew: Snowflake nodes must reject generating ids when clock moves backward beyond tolerance.
Related
- Sharding — how ids partition data.
- Load balancer — consistent hashing often pairs with id-based routing.
- Back-of-envelope — how many ids/sec per worker given bit widths.
See also: System design curriculum overview
Last updated on
Spotted something unclear or wrong on this page?