Design Google Maps (Consumer Mapping & Routing)

1. Requirements

Functional

Display interactive maps: pan, zoom, rotate; show roads, labels, POIs, traffic overlays.
Search for places and addresses; autocomplete as user types.
Compute turn-by-turn routes for driving, walking, transit; reroute on deviation.
Live traffic-aware ETA; incident and closure awareness (sourced from feeds + probes).
Offline map packs for selected regions (simplified).

Non-Functional

Scale: billions of MAU; global read-heavy tile and static map API QPS in the millions at peak; routing hundreds of thousands QPS aggregate.
Latency: vector tile fetch p99 < 100 ms from edge; route compute p99 < 200–500 ms depending on distance and mode.
Availability: 99.99% for read path; routing may degrade to cached or simpler graphs during incidents.
Consistency: eventual for tile updates; strong for user-saved places when stored in account backend.
Durability: map data versioned; user data durable; probe streams retained per policy.

Out of Scope

Full autonomous vehicle HD maps and LiDAR layers.
Indoor mapping at millimeter accuracy.
Legal licensing of third-party map data beyond high-level assumptions.

2. Back-of-Envelope Estimations

Assume 1B MAU, 10 map sessions/user/day average (varies).

Tile requests: vector tiles ~10–30 per session during pan/zoom → 100B–300B tile requests/day upper bound; many cached at CDN. Effective origin QPS after CDN: ~50k–200k aggregate (high CDN hit ratio > 90%).
Routing: 5% of sessions request a route → 500M routes/day → ~6k QPS average, 50k+ peak with skew to commute hours.
Storage: global base map petabyte scale in object storage; incremental updates TB/day. User saved lists smaller (sharded SQL).
Bandwidth: tiles highly compressible vector protobuf; PB/month egress mostly from CDN (see CDN).
Cache: popular viewport tiles and static route overview polylines—80/20 metro coverage suggests regional edge caches of tens of TB aggregate.

Tie estimation hygiene to scalability framing.

Index build churn: If 1% of graph edges change daily from construction feeds, batch graph deltas nightly → ~500M edge updates/day at web-scale mapper assumptions—streaming updates patch subgraphs while global ranks refresh offline.

Directions compute budget: A continental route touching millions of edges in worst naive graphs is impossible at latency SLOs—precomputed shortcuts shrink explored vertices to 10³–10⁴ typical for long drives; always sanity-check average explored fringe in telemetry before buying more CPU.

3. API Design

GET /v1/tiles/{z}/{x}/{y}?style=roads&v={dataVersion}
-> 200 application/x-protobuf-tile

GET /v1/places/autocomplete?input=...&sessionToken=...
-> 200 { predictions: [{ placeId, mainText, secondaryText }] }

POST /v1/directions
Body: { origin, destination, mode: "driving"|"walking"|"transit", departAt?, avoidTolls? }
-> 200 { legs: [...], polyline, durationSeconds, distanceMeters }

GET /v1/traffic/segments?bbox=...
-> 200 { segments: [{ edgeId, speedKph, confidence }] }

Errors: 400 bad bbox, 404 unknown tile version (client refresh), 503 routing overload.

GET /v1/directions/matrix
Body: { origins: [...], destinations: [...], mode }
-> 200 { durations: [[seconds]] }

4. Data Model

Road graph: directed graph stored in partitioned tiles; edge attributes (speed, lanes, turn restrictions). Often columnar + custom in-memory representation for routing engines.
Tile manifest: z/x/y → object storage key + dataVersion for cache invalidation.
Place: placeId, names, geometry, categories, hours (served from search index + KV).
UserSavedPlace (account scope): in Postgres with userId, placeId, label.

SQL for user data; search on inverted index (Elasticsearch/OpenSearch) for autocomplete; object store (S3/GCS) for tiles. Graph may live in custom binary + replication per region—see replication.

Indexes: search n-grams on place names; spatial index for “near me”.

5. High-Level Architecture

Loading diagram…

Clients load tiles from CDN backed by object storage. Route service loads graph shards into memory with cache; traffic merges probe stream aggregates into Redis or in-memory edge speeds. API gateway shapes traffic; edge TLS and WAF pair with load balancing pools.

6. Component Deep-Dives

Tile pipeline: Cartographic builds produce versioned tiles; immutable versions enable long CDN TTL; clients pass v= to bust cache on data rollouts.
Globe ↔ plane tiles: Web Mercator projection introduces polar distortion—routing stays on the graph, but tile pyramids must handle anti-meridian wrapping (±180°) so clients stitch cleanly across the dateline; tests often miss this until Pacific island QA tracks bugs.
Routing: Multi-level graph (highway hierarchy) + A* / Contraction Hierarchies / SHARC-class algorithms for speed; precompute shortcuts per region; partition at administrative boundaries to parallelize.
Traffic: Floating car data + government feeds → speed per graph edge; smooth temporally; confidence down-weights sparse edges.
Search/autocomplete: Session tokens reduce billing/charge units; prefix trie + popularity ranking; personal history overlay.
Failure: If live traffic stale, fall back to historical speed profiles; if route shard unavailable, return longer alternate or queue request.
Personalization (optional): Home/work learning improves default map viewport; session cache of recent O-D pairs on device reduces redundant route calls—server must stay stateless beyond signed session cookies.
Accessibility & i18n: Voice search adds ASR hop—separate p99 budget; RTL layouts affect tile label placement but not vector protobuf contract.

7. Bottlenecks & Mitigations

Hot tiles: World cities—over-replicate at CDN; rate limit abusive tile scrapers.
Routing hotspots: Major metros—dedicated graph replicas; request coalescing for identical OD during incidents.
Search fan-out: Autocomplete QPS—edge caching of top prefixes; bloom-style skip lists for negative cache where applicable (bloom filter).
Probe ingest storm: Kafka backpressure; sample low-value probes; aggregate in stream windows.

8. Tradeoffs

Decision	Alternative	Why we picked
Vector tiles	Raster only	Bandwidth and zoom smoothness
Precomputed graph augmentations	Pure Dijkstra	Latency at continental scale
Partner + crowd traffic	Only static speeds	Real ETA quality
Regional graph partitions	Single global graph in RAM	Feasible memory and update isolation

9. Follow-ups (interviewer drill-downs)

100× traffic? More regional route pools; approximations for long distance; batch similar requests.
Exactly-once for probes? Use idempotent device batch IDs (idempotency).
Model migration? Versioned tile format; dual-read polylines; force client min-version cut.
Multi-region active-active? Tile reads local; routing authoritative per trip session sticky to region graph replica with cross-border handoff rules.
Cost? Aggressive CDN caching; compress protobuf; deprioritize rare tile corners.
Accessibility of routing? Stairs and wheelchair access flags on graph edges require per-mode cost functions in router (not one Dijkstra run); precompute overlays per mode to keep p99 stable.
Data corrections? User submitted edits go through review queue; roll forward tile version only after validation to avoid map flapping in CDN caches worldwide.
Battery-aware routing? Cycling mode should penalize grade and intersections, not just duration—expose cost functions per profile in router service so product can iterate without rewriting graph tiles.

On this page