642. Design Search Autocomplete System

At a Glance

Topic: system-design-coding
Pattern: Trie for prefix reachability + Top-K / Heap or offline sort per query
Difficulty: Hard
Companies: Google, Meta, Amazon, Uber, Dropbox
Frequency: Low
LeetCode: 642

Streaming character input builds a query buffer; after each character (except '#'), return up to three historical sentences with that prefix, ranked by higher hotness (times), then lexicographically smaller; '#' submits the sentence, increments its count, clears the buffer.

Core Basics

Opening move: classify the object (interval, multiset, trie node, bitmask, overlapping sub-intervals…) and whether the answer lives in an enumeration, ordering, partition, graph walk, DP table, etc.
Contracts: spell what a valid partial solution looks like at each step—the thing you inductively preserve.
Complexity anchors: state the brute upper bound and the structural fact (sorting, monotonicity, optimal substructure, greedy exchange, hashability) that should beat it.

Understanding

Why brute hurts: name the repeated work or state explosion in one sentence.
Why optimal is safe: the invariant / exchange argument / optimal-subproblem story you would tell a staff engineer.

Memory Hooks

One chant / rule: a single memorable handle (acronym, operator trick, or shape) you can recall under time pressure.

Recognition Cues

Prefix while typing — trie over sentence characters (including spaces per problem charset)
Top 3 with composite ordering — tuple sort (-hotness, sentence) or min-heap of size 3 over negated key
'#' sentinel — commit buffer as new training example

Study Pattern (SDE3+)

0–3 min: restate I/O and brittle edges aloud, then verbalize two approaches with time/space before you touch the keyboard.
Implementation pass: one-line invariant above the tight loop; state what progress you make each iteration.
5 min extension: pick one constraint twist (immutable input, stream, bounded memory, parallel readers) and explain what in your solution breaks without a refactor.

Diagram

At-a-glance flow (replace with problem-specific Mermaid as you refine this note). camelCase node IDs; no spaces in IDs.

Loading diagram…

Approaches

Scan full corpus each keystroke — O(N) per char — too slow for large N.
Trie + collect all completions + sort — O(P + M log M) where P is prefix length and M is matches — acceptable for interview corpora; production uses bounded heaps per trie node (heavier implementation).
Optimal (interview) — global sentence -> count; trie navigates prefix; DFS gathers candidate sentences under node; sort with comparator; take 3.

Optimal Solution

Go

package main

import "sort"

type trieNode struct {
	children map[byte]*trieNode
}

type AutocompleteSystem struct {
	root          *trieNode
	sentenceTimes map[string]int
	currentBuffer []byte
}

func Constructor(sentences []string, times []int) AutocompleteSystem {
	system := AutocompleteSystem{
		root:          &trieNode{children: make(map[byte]*trieNode)},
		sentenceTimes: make(map[string]int),
		currentBuffer: []byte{},
	}
	for index := range sentences {
		sentence := sentences[index]
		system.sentenceTimes[sentence] += times[index]
	}
	for sentence := range system.sentenceTimes {
		system.insertSentence(sentence)
	}
	return system
}

func (system *AutocompleteSystem) insertSentence(sentence string) {
	node := system.root
	for position := 0; position < len(sentence); position++ {
		byteValue := sentence[position]
		if _, exists := node.children[byteValue]; !exists {
			node.children[byteValue] = &trieNode{children: make(map[byte]*trieNode)}
		}
		node = node.children[byteValue]
	}
}

func (system *AutocompleteSystem) gatherUnderPrefix(node *trieNode, prefix string, matches *[]string) {
	if system.sentenceTimes[prefix] > 0 {
		*matches = append(*matches, prefix)
	}
	for byteValue, child := range node.children {
		system.gatherUnderPrefix(child, prefix+string([]byte{byteValue}), matches)
	}
}

func (system *AutocompleteSystem) Input(character byte) []string {
	if character == '#' {
		sentence := string(system.currentBuffer)
		system.sentenceTimes[sentence]++
		system.insertSentence(sentence)
		system.currentBuffer = system.currentBuffer[:0]
		return []string{}
	}
	system.currentBuffer = append(system.currentBuffer, character)
	prefixNode := system.root
	for position := 0; position < len(system.currentBuffer); position++ {
		byteValue := system.currentBuffer[position]
		next, exists := prefixNode.children[byteValue]
		if !exists {
			return []string{}
		}
		prefixNode = next
	}
	matches := []string{}
	system.gatherUnderPrefix(prefixNode, string(system.currentBuffer), &matches)
	sort.Slice(matches, func(left, right int) bool {
		leftCount := system.sentenceTimes[matches[left]]
		rightCount := system.sentenceTimes[matches[right]]
		if leftCount != rightCount {
			return leftCount > rightCount
		}
		return matches[left] < matches[right]
	})
	if len(matches) > 3 {
		matches = matches[:3]
	}
	return matches
}

JavaScript

class TrieNode {
	constructor() {
		this.children = new Map();
	}
}

/**
 * @param {string[]} sentences
 * @param {number[]} times
 */
var AutocompleteSystem = function (sentences, times) {
	this.root = new TrieNode();
	this.sentenceTimes = new Map();
	for (let index = 0; index < sentences.length; index++) {
		const sentence = sentences[index];
		this.sentenceTimes.set(sentence, (this.sentenceTimes.get(sentence) || 0) + times[index]);
		this._insert(sentence);
	}
	this.buffer = '';
};

AutocompleteSystem.prototype._insert = function (sentence) {
	let node = this.root;
	for (let position = 0; position < sentence.length; position++) {
		const character = sentence[position];
		if (!node.children.has(character)) {
			node.children.set(character, new TrieNode());
		}
		node = node.children.get(character);
	}
};

AutocompleteSystem.prototype._gather = function (node, prefix) {
	const results = [];
	const dfs = (current, built) => {
		if (this.sentenceTimes.has(built) && this.sentenceTimes.get(built) > 0) {
			results.push(built);
		}
		for (const [character, child] of current.children) {
			dfs(child, built + character);
		}
	};
	dfs(node, prefix);
	return results;
};

/**
 * @param {character} character
 * @return {string[]}
 */
AutocompleteSystem.prototype.input = function (character) {
	if (character === '#') {
		const sentence = this.buffer;
		this.sentenceTimes.set(sentence, (this.sentenceTimes.get(sentence) || 0) + 1);
		this._insert(sentence);
		this.buffer = '';
		return [];
	}
	this.buffer += character;
	let node = this.root;
	for (let position = 0; position < this.buffer.length; position++) {
		const characterAt = this.buffer[position];
		if (!node.children.has(characterAt)) {
			return [];
		}
		node = node.children.get(characterAt);
	}
	const matches = this._gather(node, this.buffer);
	matches.sort((left, right) => {
		const leftTimes = this.sentenceTimes.get(left);
		const rightTimes = this.sentenceTimes.get(right);
		if (leftTimes !== rightTimes) {
			return rightTimes - leftTimes;
		}
		return left.localeCompare(right);
	});
	return matches.slice(0, 3);
};

Indexed top-k (production follow-up): maintain, at each trie node, a min-heap of size 3 (or sorted multiset) of sentences reachable below that prefix, ordered by (-times, sentence). Update heaps lazily on # along the inserted sentence’s path — amortized cheaper than collecting every match when the corpus under a prefix is huge (same spirit as Top-K / Heap).

Walkthrough

Corpus: "i love you" (5), "island" (3). Query "i " → prefix "i " navigates trie; gather sentences sharing prefix; sort by times then string → ["i love you", ...] up to 3.

Invariant: sentenceTimes always holds authoritative counts; trie mirrors inserted sentence keys so DFS does not scan unrelated branches.

Edge Cases

Prefix with no completions — return []
Fewer than three matches — return fewer
'#' on empty buffer — often inserts empty string (platform-dependent; guard if spec requires nonempty)
Tie on hotness — lexicographically smaller sentence wins

Pitfalls

Sorting only by frequency — must break ties lexicographically as stated
Rebuilding trie on every # from scratch — wasteful; incremental insert per sentence suffices
Character set includes space — trie depth spans whole sentences, not just tokens

Variants

Heap per trie node — cache top 3 at each prefix for O(1) output after amortized updates
Personalization — user-specific sentenceTimes partitions or features blended with global counts
Fuzzy / edit-distance — trigram index or BK-tree layered on top of exact trie

Mind-Map Tags

#trie #prefix-search #top-k #custom-comparator #streaming-input #autocomplete