07. Top-K / Heap
TL;DR
A priority queue (binary heap) tracks “who is best” under an ordering without fully sorting. For kth largest, keep a min-heap of size k so the root is the threshold for the top tier; for k smallest, use a max-heap of size k (or negate values with care). The two-heaps pattern maintains lower and upper halves for streaming median. Quickselect is an O(n) average, O(1) extra-space alternative when you only need one order statistic and mutating the array is allowed.
Recognition Cues
- Phrases: “kth largest/smallest,” “top k,” “k closest,” “median in a stream,” “nearly sorted,” “merge k sorted.”
- Inputs: arrays, streams of numbers, lists of sorted sequences; explicit k.
- Outputs: a single value, k elements, or median queries.
Diagram
Pattern control flow (customize nodes for this pattern). camelCase node IDs.
Loading diagram…
Mental Model
- Min-heap of size k (largest k): every new candidate smaller than the root is ignored; otherwise pop the smallest of the elite and insert the newcomer — the root remains the kth largest.
- Max-heap of size k (smallest k): symmetric; root is the largest among the k smallest (the kth smallest).
- Two heaps: max-heap holds the lower half, min-heap the upper half; balance sizes so tops meet at the median.
- Quickselect: partition like quicksort but recurse only into the side holding the kth index.
Generic Recipe
- Confirm whether you need k largest (min-heap k) or k smallest (max-heap k or negation trick).
- Heap: initialize empty heap; for each value, push then if size > k, pop; root answers kth for top-k largest/smallest.
- Two heaps for median: insert into correct heap by comparing with tops; rebalance so sizes differ by at most one; median from one or both tops.
- Quickselect:
partitionaround pivot; recurse into left or right segment until pivot index equals targetIndex. - When values are structs, order by a key (distance, frequency, timestamp).
Complexity
- Heap (k elements): each push/pop is O(log k); n values → O(n log k) time, O(k) space.
- Two heaps: insert O(log n) amortized per op, O(n) space for stored elements.
- Quickselect: O(n) average time, O(n^2) worst (rare with random pivot); O(1) extra space if in-place.
Generic Code Template
Go
package main
import (
"container/heap"
)
// minIntHeap — min-heap of ints (root = smallest). Used for kth largest via size-k cap.
type minIntHeap []int
func (heapData minIntHeap) Len() int { return len(heapData) }
func (heapData minIntHeap) Less(left, right int) bool {
return heapData[left] < heapData[right]
}
func (heapData minIntHeap) Swap(left, right int) {
heapData[left], heapData[right] = heapData[right], heapData[left]
}
func (heapData *minIntHeap) Push(value any) {
*heapData = append(*heapData, value.(int))
}
func (heapData *minIntHeap) Pop() any {
old := *heapData
lastIndex := len(old) - 1
value := old[lastIndex]
*heapData = old[:lastIndex]
return value
}
// kthLargest returns the kth largest (1-based) using a min-heap of size k.
func kthLargest(values []int, k int) int {
minHeap := &minIntHeap{}
heap.Init(minHeap)
for _, value := range values {
heap.Push(minHeap, value)
if minHeap.Len() > k {
heap.Pop(minHeap)
}
}
return (*minHeap)[0]
}
// quickselect returns value at 0-based rank index in sorted order (mutates slice).
func quickselect(values []int, rankIndex int) int {
leftBound := 0
rightBound := len(values) - 1
for {
pivotIndex := partition(values, leftBound, rightBound)
if pivotIndex == rankIndex {
return values[pivotIndex]
}
if pivotIndex < rankIndex {
leftBound = pivotIndex + 1
} else {
rightBound = pivotIndex - 1
}
}
}
func partition(values []int, leftBound, rightBound int) int {
pivotValue := values[rightBound]
storeIndex := leftBound
for scanIndex := leftBound; scanIndex < rightBound; scanIndex++ {
if values[scanIndex] < pivotValue {
values[storeIndex], values[scanIndex] = values[scanIndex], values[storeIndex]
storeIndex++
}
}
values[storeIndex], values[rightBound] = values[rightBound], values[storeIndex]
return storeIndex
}
func main() {
_ = kthLargest([]int{3, 2, 1, 5, 6, 4}, 2)
mutable := []int{3, 2, 1, 5, 6, 4}
_ = quickselect(mutable, 1)
}JavaScript
class MinHeap {
constructor() {
this.data = [];
}
size() {
return this.data.length;
}
peek() {
return this.data[0];
}
push(value) {
this.data.push(value);
this.#bubbleUp(this.data.length - 1);
}
pop() {
if (this.data.length === 0) return undefined;
const top = this.data[0];
const last = this.data.pop();
if (this.data.length > 0) {
this.data[0] = last;
this.#bubbleDown(0);
}
return top;
}
#parent(index) {
return Math.floor((index - 1) / 2);
}
#bubbleUp(index) {
while (index > 0) {
const parentIndex = this.#parent(index);
if (this.data[parentIndex] <= this.data[index]) break;
[this.data[parentIndex], this.data[index]] = [
this.data[index],
this.data[parentIndex],
];
index = parentIndex;
}
}
#bubbleDown(index) {
const length = this.data.length;
while (true) {
const left = index * 2 + 1;
const right = index * 2 + 2;
let smallest = index;
if (left < length && this.data[left] < this.data[smallest]) {
smallest = left;
}
if (right < length && this.data[right] < this.data[smallest]) {
smallest = right;
}
if (smallest === index) break;
[this.data[index], this.data[smallest]] = [
this.data[smallest],
this.data[index],
];
index = smallest;
}
}
}
function kthLargest(values, k) {
const minHeap = new MinHeap();
for (const value of values) {
minHeap.push(value);
if (minHeap.size() > k) {
minHeap.pop();
}
}
return minHeap.peek();
}
function quickselect(values, rankIndex) {
let leftBound = 0;
let rightBound = values.length - 1;
while (true) {
const pivotIndex = partition(values, leftBound, rightBound);
if (pivotIndex === rankIndex) return values[pivotIndex];
if (pivotIndex < rankIndex) leftBound = pivotIndex + 1;
else rightBound = pivotIndex - 1;
}
}
function partition(values, leftBound, rightBound) {
const pivotValue = values[rightBound];
let storeIndex = leftBound;
for (let scanIndex = leftBound; scanIndex < rightBound; scanIndex++) {
if (values[scanIndex] < pivotValue) {
[values[storeIndex], values[scanIndex]] = [
values[scanIndex],
values[storeIndex],
];
storeIndex++;
}
}
[values[storeIndex], values[rightBound]] = [
values[rightBound],
values[storeIndex],
];
return storeIndex;
}Variants
- K closest to origin: heap on distance (or squared distance to avoid sqrt).
- K most frequent: bucket by frequency or heap of
[freq, key]; 347-style. - Median data stream: two heaps; on insert, rebalance so max-lower size equals min-upper or one larger.
- K sorted lists / merge K: min-heap of (value, listId, index); IPO-style two-stage heaps for budget/profit.
- Immutable / single array: prefer quickselect; online stream → heap.
Representative Problems
- 215. Kth Largest Element in an Array — heap or quickselect
- 703. Kth Largest Element in a Stream — fixed-size min-heap
- 295. Find Median from Data Stream — two heaps
- 973. K Closest Points to Origin — heap by distance
- 347. Top K Frequent Elements — frequency + heap or buckets
Anti-patterns
- Using max-heap for k largest without thinking — you usually want min-heap of size k for “top k largest.”
- Storing full arrays in heap elements when a pair (priority, id) suffices.
- Quickselect on “median of two sorted arrays” style problems where binary search on answer is the right abstraction.
- Tie-breaking by object identity when the problem requires stable or deterministic ordering on equal keys.
Last updated on
Spotted something unclear or wrong on this page?