Design Typeahead Autocomplete
1. Requirements
Functional
- As user types a query prefix in search box, show top-K suggestions (completed phrases or entities) within tens of milliseconds.
- Support popular completions, personalized history (optional), and spell-tolerance light (not full spellcheck engine).
- Respect locale and safe-search filtering for suggestions.
- Debouncing occurs client-side; server receives prefixes of length >= 2 (configurable).
- Ranking by blended score: prefix match quality, global popularity, personalization signals.
Non-Functional
- Scale: 500M searches/day on property → typeahead requests often 3–5x that as users revise prefixes → 1.5B suggestions/day ~ 17,400/s average; peaks 150k/s.
- Latency: p99 under 40 ms server-side in same region; edge closer to 20 ms with cache.
- Availability: 99.99% (degraded mode returns cached popular-only).
- Consistency: eventually consistent with index updates; trending terms lag seconds acceptable.
- Durability: suggestion corpus derived from offline logs—online index ephemeral with periodic rebuild.
Out of Scope
- Full document search results page (only suggestions).
- Voice/type multimodal input.
- Deep learning seq2seq generation of arbitrary novel phrases (focus classical retrieval + ranking).
- Multi-tenant enterprise ACL per suggestion row.
2. Back-of-Envelope Estimations
Requests: 1.5B/day ~ 17.4k/s; peak factor 10x ~ 174k/s globally.
If 40% are duplicates after normalization (“weather ” same popular prefix), effective origin load lower; CDN and merge cache at gateway reduce further.
Payload: ~200 B JSON per suggestion * 8 suggestions ~ 1.6 KB per response; 174k * 1.6 KB ~ 278 MB/s peak egress from suggestion tier—feasible with regional clusters + edge.
Index size: 10M distinct suggestion strings, avg 30 chars UTF-8 + scores + pointers ~ 64 B overhead → ~1 GB base trie-like structures + posting lists; 10–20 GB with replicas and n-grams.
Logging for offline rebuild: sample 1% of queries → still huge; use streaming aggregation to Hive/BigQuery daily job.
3. API Design
GET /v1/suggest?q=weath&sessionId=sess_abc&limit=8
Headers: Accept-Language: en-US
-> 200 {
"query": "weath",
"suggestions": [
{ "text": "weather today", "source": "popular", "score": 0.98 },
{ "text": "weather tomorrow", "source": "popular", "score": 0.95 }
]
}GET /v1/suggest/health
-> 200 { "indexVersion": 202604291200, "backend": "ok" }WebSocket optional for ultra-low latency apps:
WS /v1/suggest/stream
Client sends: { "prefix": "wea" }
Server pushes: { "suggestions": [...] }Internal gRPC: Suggest(SuggestRequest) returns (SuggestResponse) with protobuf for smaller payloads.
4. Data Model
Trie / prefix tree — in-memory per shard storing compressed branches to descendants.
Inverted auxiliary: map prefix_hash -> list(sorted suggestion_id) maintained incrementally.
Elasticsearch/OpenSearch index with edge n-gram tokenizer on suggestion_text field:
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "edge_ngram_filter"]
}
},
"filter": {
"edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 20 }
}
}
}
}Why Elasticsearch vs custom trie service
- ES: faster time-to-product, horizontal scaling, BM25 scoring hooks — CPU heavier per request.
- Custom trie in C++/Rust: lowest latency, highest engineering cost — pick when p99 <10 ms hard requirement.
Personalization store
- Redis sorted sets
user:{id}:recent_querieswith capped length 50 — intersect with global candidates at rank time (caching).
Analytics warehouse
- BigQuery table
raw_queriesfor nightly popularity aggregation feeding batch index builder.
Sample popularity row (offline)
| normalized_query | impression_count_7d | last_updated |
|---|---|---|
| weather today | 12000000 | 2026-04-29 |
5. High-Level Architecture
6. Component Deep-Dives
Serving tier: trie microservice
- Sharded by first two letters of normalized query (36^2 buckets alnum) mapped via consistent hashing to pods — avoids single giant trie.
- Memory: succinct trie (DAWG / Double-array trie) vs naive pointer trie — 5–10x space savings important at 100M nodes.
- Update: blue/green load new immutable snapshot from S3 at version boundary — double buffering in process.
Elasticsearch path
- Good for teams already operating ES; completion suggester API built-in but less flexible than custom ranking — combine with function_score for popularity.
Ranking fusion
- Score =
w1 * prefixScore + w2 * log(popularity) + w3 * personalBoost. - Learning to rank optional second stage with lightweight model on top 50 candidates only.
Client debouncing
- 50–100 ms debounce reduces wasteful requests 50%+ — product trade vs perceived responsiveness.
CDN caution
- Personalized responses often uncacheable; cache only anonymous popular prefixes at edge with Vary on none — separate URL path
/v1/suggest/public?q=.
Cold start
- Warm connections HTTP/2 + gRPC keep-alive from gateway to trie shards.
Why not pure DB LIKE 'prefix%'
- Relational index range scans on huge tables lag tens of ms under concurrent load — unacceptable for typeahead at scale.
7. Bottlenecks & Mitigations
| Issue | Symptom | Mitigation |
|---|---|---|
| Hot prefix "we" | Few shards overloaded | Finer sharding; rate limit per IP; serve static top-10 from edge |
| Index rebuild storm | CPU spike | Canary snapshot rollout; async replica warming |
| ES heap pressure | GC pauses | Dedicated coordinating nodes; shrink shard count |
| Thundering herd new trending term | Cache miss stampede | Request coalescing singleflight per prefix in gateway (caching) |
| Unicode normalization | Missed matches | NFC normalize + lowercase pipeline consistent offline/online |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Custom trie service | ES only | Latency tail control at peak |
| Immutable snapshot deploy | Live incremental trie edits | Simplicity and predictable perf |
| Redis personal history | Only popular | Engagement lift vs privacy review |
| Edge cache public prefixes | Full dynamic | Cost and latency for anonymous |
| gRPC internal | REST | Less overhead high QPS |
| Sampled logging | Log all queries | Cost; still statistically robust trends |
9. Follow-ups (interviewer drill-downs)
- Typo tolerance? Separate SymSpell index or fuzzy on ES
fuzziness: AUTOon top-K popular only — latency guard. - Multi-region: replicate read-only snapshots; last-writer-wins on version promotes every hour.
- A/B ranking models: shadow traffic compare pCTR — feature flag in fusion service.
- Children safe: blocklist trie overlay merged at rank stage.
- Relate to news feed query box as downstream consumer of session context.
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?