THN Interview Prep

07. Top-K / Heap

TL;DR

A priority queue (binary heap) tracks “who is best” under an ordering without fully sorting. For kth largest, keep a min-heap of size k so the root is the threshold for the top tier; for k smallest, use a max-heap of size k (or negate values with care). The two-heaps pattern maintains lower and upper halves for streaming median. Quickselect is an O(n) average, O(1) extra-space alternative when you only need one order statistic and mutating the array is allowed.

Recognition Cues

  • Phrases: “kth largest/smallest,” “top k,” “k closest,” “median in a stream,” “nearly sorted,” “merge k sorted.”
  • Inputs: arrays, streams of numbers, lists of sorted sequences; explicit k.
  • Outputs: a single value, k elements, or median queries.

Diagram

Pattern control flow (customize nodes for this pattern). camelCase node IDs.

Loading diagram…

Mental Model

  • Min-heap of size k (largest k): every new candidate smaller than the root is ignored; otherwise pop the smallest of the elite and insert the newcomer — the root remains the kth largest.
  • Max-heap of size k (smallest k): symmetric; root is the largest among the k smallest (the kth smallest).
  • Two heaps: max-heap holds the lower half, min-heap the upper half; balance sizes so tops meet at the median.
  • Quickselect: partition like quicksort but recurse only into the side holding the kth index.

Generic Recipe

  1. Confirm whether you need k largest (min-heap k) or k smallest (max-heap k or negation trick).
  2. Heap: initialize empty heap; for each value, push then if size > k, pop; root answers kth for top-k largest/smallest.
  3. Two heaps for median: insert into correct heap by comparing with tops; rebalance so sizes differ by at most one; median from one or both tops.
  4. Quickselect: partition around pivot; recurse into left or right segment until pivot index equals targetIndex.
  5. When values are structs, order by a key (distance, frequency, timestamp).

Complexity

  • Heap (k elements): each push/pop is O(log k); n values → O(n log k) time, O(k) space.
  • Two heaps: insert O(log n) amortized per op, O(n) space for stored elements.
  • Quickselect: O(n) average time, O(n^2) worst (rare with random pivot); O(1) extra space if in-place.

Generic Code Template

Go

package main

import (
	"container/heap"
)

// minIntHeap — min-heap of ints (root = smallest). Used for kth largest via size-k cap.
type minIntHeap []int

func (heapData minIntHeap) Len() int           { return len(heapData) }
func (heapData minIntHeap) Less(left, right int) bool {
	return heapData[left] < heapData[right]
}
func (heapData minIntHeap) Swap(left, right int) {
	heapData[left], heapData[right] = heapData[right], heapData[left]
}
func (heapData *minIntHeap) Push(value any) {
	*heapData = append(*heapData, value.(int))
}
func (heapData *minIntHeap) Pop() any {
	old := *heapData
	lastIndex := len(old) - 1
	value := old[lastIndex]
	*heapData = old[:lastIndex]
	return value
}

// kthLargest returns the kth largest (1-based) using a min-heap of size k.
func kthLargest(values []int, k int) int {
	minHeap := &minIntHeap{}
	heap.Init(minHeap)
	for _, value := range values {
		heap.Push(minHeap, value)
		if minHeap.Len() > k {
			heap.Pop(minHeap)
		}
	}
	return (*minHeap)[0]
}

// quickselect returns value at 0-based rank index in sorted order (mutates slice).
func quickselect(values []int, rankIndex int) int {
	leftBound := 0
	rightBound := len(values) - 1
	for {
		pivotIndex := partition(values, leftBound, rightBound)
		if pivotIndex == rankIndex {
			return values[pivotIndex]
		}
		if pivotIndex < rankIndex {
			leftBound = pivotIndex + 1
		} else {
			rightBound = pivotIndex - 1
		}
	}
}

func partition(values []int, leftBound, rightBound int) int {
	pivotValue := values[rightBound]
	storeIndex := leftBound
	for scanIndex := leftBound; scanIndex < rightBound; scanIndex++ {
		if values[scanIndex] < pivotValue {
			values[storeIndex], values[scanIndex] = values[scanIndex], values[storeIndex]
			storeIndex++
		}
	}
	values[storeIndex], values[rightBound] = values[rightBound], values[storeIndex]
	return storeIndex
}

func main() {
	_ = kthLargest([]int{3, 2, 1, 5, 6, 4}, 2)
	mutable := []int{3, 2, 1, 5, 6, 4}
	_ = quickselect(mutable, 1)
}

JavaScript

class MinHeap {
  constructor() {
    this.data = [];
  }
  size() {
    return this.data.length;
  }
  peek() {
    return this.data[0];
  }
  push(value) {
    this.data.push(value);
    this.#bubbleUp(this.data.length - 1);
  }
  pop() {
    if (this.data.length === 0) return undefined;
    const top = this.data[0];
    const last = this.data.pop();
    if (this.data.length > 0) {
      this.data[0] = last;
      this.#bubbleDown(0);
    }
    return top;
  }
  #parent(index) {
    return Math.floor((index - 1) / 2);
  }
  #bubbleUp(index) {
    while (index > 0) {
      const parentIndex = this.#parent(index);
      if (this.data[parentIndex] <= this.data[index]) break;
      [this.data[parentIndex], this.data[index]] = [
        this.data[index],
        this.data[parentIndex],
      ];
      index = parentIndex;
    }
  }
  #bubbleDown(index) {
    const length = this.data.length;
    while (true) {
      const left = index * 2 + 1;
      const right = index * 2 + 2;
      let smallest = index;
      if (left < length && this.data[left] < this.data[smallest]) {
        smallest = left;
      }
      if (right < length && this.data[right] < this.data[smallest]) {
        smallest = right;
      }
      if (smallest === index) break;
      [this.data[index], this.data[smallest]] = [
        this.data[smallest],
        this.data[index],
      ];
      index = smallest;
    }
  }
}

function kthLargest(values, k) {
  const minHeap = new MinHeap();
  for (const value of values) {
    minHeap.push(value);
    if (minHeap.size() > k) {
      minHeap.pop();
    }
  }
  return minHeap.peek();
}

function quickselect(values, rankIndex) {
  let leftBound = 0;
  let rightBound = values.length - 1;
  while (true) {
    const pivotIndex = partition(values, leftBound, rightBound);
    if (pivotIndex === rankIndex) return values[pivotIndex];
    if (pivotIndex < rankIndex) leftBound = pivotIndex + 1;
    else rightBound = pivotIndex - 1;
  }
}

function partition(values, leftBound, rightBound) {
  const pivotValue = values[rightBound];
  let storeIndex = leftBound;
  for (let scanIndex = leftBound; scanIndex < rightBound; scanIndex++) {
    if (values[scanIndex] < pivotValue) {
      [values[storeIndex], values[scanIndex]] = [
        values[scanIndex],
        values[storeIndex],
      ];
      storeIndex++;
    }
  }
  [values[storeIndex], values[rightBound]] = [
    values[rightBound],
    values[storeIndex],
  ];
  return storeIndex;
}

Variants

  • K closest to origin: heap on distance (or squared distance to avoid sqrt).
  • K most frequent: bucket by frequency or heap of [freq, key]; 347-style.
  • Median data stream: two heaps; on insert, rebalance so max-lower size equals min-upper or one larger.
  • K sorted lists / merge K: min-heap of (value, listId, index); IPO-style two-stage heaps for budget/profit.
  • Immutable / single array: prefer quickselect; online stream → heap.

Representative Problems

Anti-patterns

  • Using max-heap for k largest without thinking — you usually want min-heap of size k for “top k largest.”
  • Storing full arrays in heap elements when a pair (priority, id) suffices.
  • Quickselect on “median of two sorted arrays” style problems where binary search on answer is the right abstraction.
  • Tie-breaking by object identity when the problem requires stable or deterministic ordering on equal keys.

Last updated on

Spotted something unclear or wrong on this page?

On this page