THN Interview Prep

Design Typeahead Autocomplete

1. Requirements

Functional

  • As user types a query prefix in search box, show top-K suggestions (completed phrases or entities) within tens of milliseconds.
  • Support popular completions, personalized history (optional), and spell-tolerance light (not full spellcheck engine).
  • Respect locale and safe-search filtering for suggestions.
  • Debouncing occurs client-side; server receives prefixes of length >= 2 (configurable).
  • Ranking by blended score: prefix match quality, global popularity, personalization signals.

Non-Functional

  • Scale: 500M searches/day on property → typeahead requests often 3–5x that as users revise prefixes → 1.5B suggestions/day ~ 17,400/s average; peaks 150k/s.
  • Latency: p99 under 40 ms server-side in same region; edge closer to 20 ms with cache.
  • Availability: 99.99% (degraded mode returns cached popular-only).
  • Consistency: eventually consistent with index updates; trending terms lag seconds acceptable.
  • Durability: suggestion corpus derived from offline logs—online index ephemeral with periodic rebuild.

Out of Scope

  • Full document search results page (only suggestions).
  • Voice/type multimodal input.
  • Deep learning seq2seq generation of arbitrary novel phrases (focus classical retrieval + ranking).
  • Multi-tenant enterprise ACL per suggestion row.

2. Back-of-Envelope Estimations

Requests: 1.5B/day ~ 17.4k/s; peak factor 10x ~ 174k/s globally.

If 40% are duplicates after normalization (“weather ” same popular prefix), effective origin load lower; CDN and merge cache at gateway reduce further.

Payload: ~200 B JSON per suggestion * 8 suggestions ~ 1.6 KB per response; 174k * 1.6 KB ~ 278 MB/s peak egress from suggestion tier—feasible with regional clusters + edge.

Index size: 10M distinct suggestion strings, avg 30 chars UTF-8 + scores + pointers ~ 64 B overhead → ~1 GB base trie-like structures + posting lists; 10–20 GB with replicas and n-grams.

Logging for offline rebuild: sample 1% of queries → still huge; use streaming aggregation to Hive/BigQuery daily job.


3. API Design

GET /v1/suggest?q=weath&sessionId=sess_abc&limit=8
Headers: Accept-Language: en-US
-> 200 {
  "query": "weath",
  "suggestions": [
    { "text": "weather today", "source": "popular", "score": 0.98 },
    { "text": "weather tomorrow", "source": "popular", "score": 0.95 }
  ]
}
GET /v1/suggest/health
-> 200 { "indexVersion": 202604291200, "backend": "ok" }

WebSocket optional for ultra-low latency apps:

WS /v1/suggest/stream
Client sends: { "prefix": "wea" }
Server pushes: { "suggestions": [...] }

Internal gRPC: Suggest(SuggestRequest) returns (SuggestResponse) with protobuf for smaller payloads.


4. Data Model

Trie / prefix tree — in-memory per shard storing compressed branches to descendants.

Inverted auxiliary: map prefix_hash -> list(sorted suggestion_id) maintained incrementally.

Elasticsearch/OpenSearch index with edge n-gram tokenizer on suggestion_text field:

{
  "settings": {
    "analysis": {
      "analyzer": {
        "prefix_analyzer": {
          "tokenizer": "standard",
          "filter": ["lowercase", "edge_ngram_filter"]
        }
      },
      "filter": {
        "edge_ngram_filter": { "type": "edge_ngram", "min_gram": 2, "max_gram": 20 }
      }
    }
  }
}

Why Elasticsearch vs custom trie service

  • ES: faster time-to-product, horizontal scaling, BM25 scoring hooks — CPU heavier per request.
  • Custom trie in C++/Rust: lowest latency, highest engineering cost — pick when p99 <10 ms hard requirement.

Personalization store

  • Redis sorted sets user:{id}:recent_queries with capped length 50 — intersect with global candidates at rank time (caching).

Analytics warehouse

  • BigQuery table raw_queries for nightly popularity aggregation feeding batch index builder.

Sample popularity row (offline)

normalized_queryimpression_count_7dlast_updated
weather today120000002026-04-29

5. High-Level Architecture

Loading diagram…

6. Component Deep-Dives

Serving tier: trie microservice

  • Sharded by first two letters of normalized query (36^2 buckets alnum) mapped via consistent hashing to pods — avoids single giant trie.
  • Memory: succinct trie (DAWG / Double-array trie) vs naive pointer trie — 5–10x space savings important at 100M nodes.
  • Update: blue/green load new immutable snapshot from S3 at version boundary — double buffering in process.

Elasticsearch path

  • Good for teams already operating ES; completion suggester API built-in but less flexible than custom ranking — combine with function_score for popularity.

Ranking fusion

  • Score = w1 * prefixScore + w2 * log(popularity) + w3 * personalBoost.
  • Learning to rank optional second stage with lightweight model on top 50 candidates only.

Client debouncing

  • 50–100 ms debounce reduces wasteful requests 50%+ — product trade vs perceived responsiveness.

CDN caution

  • Personalized responses often uncacheable; cache only anonymous popular prefixes at edge with Vary on none — separate URL path /v1/suggest/public?q=.

Cold start

  • Warm connections HTTP/2 + gRPC keep-alive from gateway to trie shards.

Why not pure DB LIKE 'prefix%'

  • Relational index range scans on huge tables lag tens of ms under concurrent load — unacceptable for typeahead at scale.

7. Bottlenecks & Mitigations

IssueSymptomMitigation
Hot prefix "we"Few shards overloadedFiner sharding; rate limit per IP; serve static top-10 from edge
Index rebuild stormCPU spikeCanary snapshot rollout; async replica warming
ES heap pressureGC pausesDedicated coordinating nodes; shrink shard count
Thundering herd new trending termCache miss stampedeRequest coalescing singleflight per prefix in gateway (caching)
Unicode normalizationMissed matchesNFC normalize + lowercase pipeline consistent offline/online

8. Tradeoffs

DecisionAlternativeWhy we picked
Custom trie serviceES onlyLatency tail control at peak
Immutable snapshot deployLive incremental trie editsSimplicity and predictable perf
Redis personal historyOnly popularEngagement lift vs privacy review
Edge cache public prefixesFull dynamicCost and latency for anonymous
gRPC internalRESTLess overhead high QPS
Sampled loggingLog all queriesCost; still statistically robust trends

9. Follow-ups (interviewer drill-downs)

  • Typo tolerance? Separate SymSpell index or fuzzy on ES fuzziness: AUTO on top-K popular only — latency guard.
  • Multi-region: replicate read-only snapshots; last-writer-wins on version promotes every hour.
  • A/B ranking models: shadow traffic compare pCTR — feature flag in fusion service.
  • Children safe: blocklist trie overlay merged at rank stage.
  • Relate to news feed query box as downstream consumer of session context.

Last updated on

Spotted something unclear or wrong on this page?

On this page