Design Typeahead Autocomplete
1. Requirements
Functional
- As user types a query prefix in search box, show top-K suggestions (completed phrases or entities) within tens of milliseconds.
- Support popular completions, personalized history (optional), and spell-tolerance light (not full spellcheck engine).
- Respect locale and safe-search filtering for suggestions.
- Debouncing occurs client-side; server receives prefixes of length >= 2 (configurable).
- Ranking by blended score: prefix match quality, global popularity, personalization signals.
Non-Functional
- Scale: 500M searches/day on property → typeahead requests often 3–5x that as users revise prefixes → 1.5B suggestions/day ~ 17,400/s average; peaks 150k/s.
- Latency: p99 under 40 ms server-side in same region; edge closer to 20 ms with cache.
- Availability: 99.99% (degraded mode returns cached popular-only).
- Consistency: eventually consistent with index updates; trending terms lag seconds acceptable.
- Durability: suggestion corpus derived from offline logs—online index ephemeral with periodic rebuild.
Out of Scope
- Full document search results page (only suggestions).
- Voice/type multimodal input.
- Deep learning seq2seq generation of arbitrary novel phrases (focus classical retrieval + ranking).
- Multi-tenant enterprise ACL per suggestion row.
2. Back-of-Envelope Estimations
Requests: 1.5B/day ~ 17.4k/s; peak factor 10x ~ 174k/s globally.
If 40% are duplicates after normalization (“weather ” same popular prefix), effective origin load lower; CDN and merge cache at gateway reduce further.
Payload: ~200 B JSON per suggestion * 8 suggestions ~ 1.6 KB per response; 174k * 1.6 KB ~ 278 MB/s peak egress from suggestion tier—feasible with regional clusters + edge.
Index size: 10M distinct suggestion strings, avg 30 chars UTF-8 + scores + pointers ~ 64 B overhead → ~1 GB base trie-like structures + posting lists; 10–20 GB with replicas and n-grams.
Logging for offline rebuild: sample 1% of queries → still huge; use streaming aggregation to Hive/BigQuery daily job.
3. API Design
GET /v1/suggest?q=weath&sessionId=sess_abc&limit=8
Headers: Accept-Language: en-US
-> 200 {
"query": "weath",
"suggestions": [
{ "text": "weather today", "source": "popular", "score": 0.98 },
{ "text": "weather tomorrow", "source": "popular", "score": 0.95 }
]
}GET /v1/suggest/health
-> 200 { "indexVersion": 202604291200, "backend": "ok" }WebSocket optional for ultra-low latency apps:
WS /v1/suggest/stream
Client sends: { "prefix": "wea" }
Server pushes: { "suggestions": [...] }Internal gRPC: Suggest(SuggestRequest) returns (SuggestResponse) with protobuf for smaller payloads.
4. Data Model
Trie / prefix tree — in-memory per shard storing compressed branches to descendants.
Inverted auxiliary: map prefix_hash -> list(sorted suggestion_id) maintained incrementally.
Elasticsearch/OpenSearch index with edge n-gram tokenizer on suggestion_text field:
{
"settings": {
"analysis": {
"analyzer": {
"prefix_analyzer": {
"tokenizer": "standard",
"filter": ["lowercase", "edge_ngram_filter"]
}
},
"filter": {
"edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 20 }
}
}
}
}Why Elasticsearch vs custom trie service
- ES: faster time-to-product, horizontal scaling, BM25 scoring hooks — CPU heavier per request.
- Custom trie in C++/Rust: lowest latency, highest engineering cost — pick when p99 <10 ms hard requirement.
Personalization store
- Redis sorted sets
user:{id}:recent_querieswith capped length 50 — intersect with global candidates at rank time (caching).
Analytics warehouse
- BigQuery table
raw_queriesfor nightly popularity aggregation feeding batch index builder.
Sample popularity row (offline)
| normalized_query | impression_count_7d | last_updated |
|---|---|---|
| weather today | 12000000 | 2026-04-29 |
5. High-Level Architecture
6. Component Deep-Dives
Serving tier: trie microservice
- Sharded by first two letters of normalized query (36^2 buckets alnum) mapped via consistent hashing to pods — avoids single giant trie.
- Memory: succinct trie (DAWG / Double-array trie) vs naive pointer trie — 5–10x space savings important at 100M nodes.
- Update: blue/green load new immutable snapshot from S3 at version boundary — double buffering in process.
Elasticsearch path
- Good for teams already operating ES; completion suggester API built-in but less flexible than custom ranking — combine with function_score for popularity.
Ranking fusion
- Score =
w1 * prefixScore + w2 * log(popularity) + w3 * personalBoost. - Learning to rank optional second stage with lightweight model on top 50 candidates only.
Client debouncing
- 50–100 ms debounce reduces wasteful requests 50%+ — product trade vs perceived responsiveness.
CDN caution
- Personalized responses often uncacheable; cache only anonymous popular prefixes at edge with Vary on none — separate URL path
/v1/suggest/public?q=.
Cold start
- Warm connections HTTP/2 + gRPC keep-alive from gateway to trie shards.
Why not pure DB LIKE 'prefix%'
- Relational index range scans on huge tables lag tens of ms under concurrent load — unacceptable for typeahead at scale.
7. Bottlenecks & Mitigations
| Issue | Symptom | Mitigation |
|---|---|---|
| Hot prefix "we" | Few shards overloaded | Finer sharding; rate limit per IP; serve static top-10 from edge |
| Index rebuild storm | CPU spike | Canary snapshot rollout; async replica warming |
| ES heap pressure | GC pauses | Dedicated coordinating nodes; shrink shard count |
| Thundering herd new trending term | Cache miss stampede | Request coalescing singleflight per prefix in gateway (caching) |
| Unicode normalization | Missed matches | NFC normalize + lowercase pipeline consistent offline/online |
8. Tradeoffs
| Decision | Alternative | Why we picked |
|---|---|---|
| Custom trie service | ES only | Latency tail control at peak |
| Immutable snapshot deploy | Live incremental trie edits | Simplicity and predictable perf |
| Redis personal history | Only popular | Engagement lift vs privacy review |
| Edge cache public prefixes | Full dynamic | Cost and latency for anonymous |
| gRPC internal | REST | Less overhead high QPS |
| Sampled logging | Log all queries | Cost; still statistically robust trends |
9. Follow-ups (interviewer drill-downs)
- Typo tolerance? Separate SymSpell index or fuzzy on ES
fuzziness: AUTOon top-K popular only — latency guard. - Multi-region: replicate read-only snapshots; last-writer-wins on version promotes every hour.
- A/B ranking models: shadow traffic compare pCTR — feature flag in fusion service.
- Children safe: blocklist trie overlay merged at rank stage.
- Relate to news feed query box as downstream consumer of session context.
Last updated on
Spotted something unclear or wrong on this page?