THN Interview Prep

07. Top-K / Heap

TL;DR

A priority queue (binary heap) tracks “who is best” under an ordering without fully sorting. For kth largest, keep a min-heap of size k so the root is the threshold for the top tier; for k smallest, use a max-heap of size k (or negate values with care). The two-heaps pattern maintains lower and upper halves for streaming median. Quickselect is an O(n) average, O(1) extra-space alternative when you only need one order statistic and mutating the array is allowed.

Core Basics

  • Object: maintain k extremal elements with a size-k heap — min-heap tracks k largest, max-heap tracks k smallest.
  • Latency story: heap gives (O(n log k)) streams; quickselect is (O(n)) average for one-shot kth on arrays.
  • Staff-level bar: pick structure for mutable k, online input, or ties / stable reporting.

Recognition Cues

  • Phrases: “kth largest/smallest,” “top k,” “k closest,” “median in a stream,” “nearly sorted,” “merge k sorted.”
  • Inputs: arrays, streams of numbers, lists of sorted sequences; explicit k.
  • Outputs: a single value, k elements, or median queries.

Use-case Lens

  • Best when: you only need the top/bottom k items, kth item, or continuously updated priority choice.
  • Not for: fully sorted output unless sorting is acceptable and simpler.
  • Main invariant: the heap contains the best k candidates seen so far, or the next globally best candidate to pop.
  • Interview move: explain why heap size stays k, giving (O(n \log k)) instead of sorting all (n) items.

Diagram

Loading diagram…

Step-by-step Visual

Example: Find the top 3 largest numbers in [7, 10, 4, 3, 20, 15]. Keep a min-heap of size k; the smallest heap value is the current cutoff.

Loading diagram…

Understanding

  • Min-heap of size k (largest k): every new candidate smaller than the root is ignored; otherwise pop the smallest of the elite and insert the newcomer — the root remains the kth largest.
  • Max-heap of size k (smallest k): symmetric; root is the largest among the k smallest (the kth smallest).
  • Two heaps: max-heap holds the lower half, min-heap the upper half; balance sizes so tops meet at the median.
  • Quickselect: partition like quicksort but recurse only into the side holding the kth index.

Generic Recipe

  1. Confirm whether you need k largest (min-heap k) or k smallest (max-heap k or negation trick).
  2. Heap: initialize empty heap; for each value, push then if size > k, pop; root answers kth for top-k largest/smallest.
  3. Two heaps for median: insert into correct heap by comparing with tops; rebalance so sizes differ by at most one; median from one or both tops.
  4. Quickselect: partition around pivot; recurse into left or right segment until pivot index equals targetIndex.
  5. When values are structs, order by a key (distance, frequency, timestamp).

Complexity

  • Heap (k elements): each push/pop is O(log k); n values → O(n log k) time, O(k) space.
  • Two heaps: insert O(log n) amortized per op, O(n) space for stored elements.
  • Quickselect: O(n) average time, O(n^2) worst (rare with random pivot); O(1) extra space if in-place.

Memory Hooks

  • Operational mantra: keep only the candidates that can still be top-k; heap root is the next discard or answer.
  • Anti-panic: choose heap polarity from what you discard: top-k largest usually keeps a min-heap of size k.

Study Pattern (SDE3+)

  • Recognition drill (10 min): classify prompts as fixed top-k, streaming rank, two-heaps median, or repeated next-best merge.
  • Implementation sprint (25 min): code kth largest with a bounded heap and median stream with two heaps.
  • Staff follow-up (10 min): compare heap, bucket, quickselect, and sort under streaming, memory, and update constraints.

Generic Code Template

Go

package main

import (
	"container/heap"
)

// minIntHeap — min-heap of ints (root = smallest). Used for kth largest via size-k cap.
type minIntHeap []int

func (heapData minIntHeap) Len() int           { return len(heapData) }
func (heapData minIntHeap) Less(left, right int) bool {
	return heapData[left] < heapData[right]
}
func (heapData minIntHeap) Swap(left, right int) {
	heapData[left], heapData[right] = heapData[right], heapData[left]
}
func (heapData *minIntHeap) Push(value any) {
	*heapData = append(*heapData, value.(int))
}
func (heapData *minIntHeap) Pop() any {
	old := *heapData
	lastIndex := len(old) - 1
	value := old[lastIndex]
	*heapData = old[:lastIndex]
	return value
}

// kthLargest returns the kth largest (1-based) using a min-heap of size k.
func kthLargest(values []int, k int) int {
	minHeap := &minIntHeap{}
	heap.Init(minHeap)
	for _, value := range values {
		heap.Push(minHeap, value)
		if minHeap.Len() > k {
			heap.Pop(minHeap)
		}
	}
	return (*minHeap)[0]
}

// quickselect returns value at 0-based rank index in sorted order (mutates slice).
func quickselect(values []int, rankIndex int) int {
	leftBound := 0
	rightBound := len(values) - 1
	for {
		pivotIndex := partition(values, leftBound, rightBound)
		if pivotIndex == rankIndex {
			return values[pivotIndex]
		}
		if pivotIndex < rankIndex {
			leftBound = pivotIndex + 1
		} else {
			rightBound = pivotIndex - 1
		}
	}
}

func partition(values []int, leftBound, rightBound int) int {
	pivotValue := values[rightBound]
	storeIndex := leftBound
	for scanIndex := leftBound; scanIndex < rightBound; scanIndex++ {
		if values[scanIndex] < pivotValue {
			values[storeIndex], values[scanIndex] = values[scanIndex], values[storeIndex]
			storeIndex++
		}
	}
	values[storeIndex], values[rightBound] = values[rightBound], values[storeIndex]
	return storeIndex
}

func main() {
	_ = kthLargest([]int{3, 2, 1, 5, 6, 4}, 2)
	mutable := []int{3, 2, 1, 5, 6, 4}
	_ = quickselect(mutable, 1)
}

JavaScript

class MinHeap {
  constructor() {
    this.data = [];
  }
  size() {
    return this.data.length;
  }
  peek() {
    return this.data[0];
  }
  push(value) {
    this.data.push(value);
    this.#bubbleUp(this.data.length - 1);
  }
  pop() {
    if (this.data.length === 0) return undefined;
    const top = this.data[0];
    const last = this.data.pop();
    if (this.data.length > 0) {
      this.data[0] = last;
      this.#bubbleDown(0);
    }
    return top;
  }
  #parent(index) {
    return Math.floor((index - 1) / 2);
  }
  #bubbleUp(index) {
    while (index > 0) {
      const parentIndex = this.#parent(index);
      if (this.data[parentIndex] <= this.data[index]) break;
      [this.data[parentIndex], this.data[index]] = [
        this.data[index],
        this.data[parentIndex],
      ];
      index = parentIndex;
    }
  }
  #bubbleDown(index) {
    const length = this.data.length;
    while (true) {
      const left = index * 2 + 1;
      const right = index * 2 + 2;
      let smallest = index;
      if (left < length && this.data[left] < this.data[smallest]) {
        smallest = left;
      }
      if (right < length && this.data[right] < this.data[smallest]) {
        smallest = right;
      }
      if (smallest === index) break;
      [this.data[index], this.data[smallest]] = [
        this.data[smallest],
        this.data[index],
      ];
      index = smallest;
    }
  }
}

function kthLargest(values, k) {
  const minHeap = new MinHeap();
  for (const value of values) {
    minHeap.push(value);
    if (minHeap.size() > k) {
      minHeap.pop();
    }
  }
  return minHeap.peek();
}

function quickselect(values, rankIndex) {
  let leftBound = 0;
  let rightBound = values.length - 1;
  while (true) {
    const pivotIndex = partition(values, leftBound, rightBound);
    if (pivotIndex === rankIndex) return values[pivotIndex];
    if (pivotIndex < rankIndex) leftBound = pivotIndex + 1;
    else rightBound = pivotIndex - 1;
  }
}

function partition(values, leftBound, rightBound) {
  const pivotValue = values[rightBound];
  let storeIndex = leftBound;
  for (let scanIndex = leftBound; scanIndex < rightBound; scanIndex++) {
    if (values[scanIndex] < pivotValue) {
      [values[storeIndex], values[scanIndex]] = [
        values[scanIndex],
        values[storeIndex],
      ];
      storeIndex++;
    }
  }
  [values[storeIndex], values[rightBound]] = [
    values[rightBound],
    values[storeIndex],
  ];
  return storeIndex;
}

Variants

  • K closest to origin: heap on distance (or squared distance to avoid sqrt).
  • K most frequent: bucket by frequency or heap of [freq, key]; 347-style.
  • Median data stream: two heaps; on insert, rebalance so max-lower size equals min-upper or one larger.
  • K sorted lists / merge K: min-heap of (value, listId, index); IPO-style two-stage heaps for budget/profit.
  • Immutable / single array: prefer quickselect; online stream → heap.

Representative Problems

Anti-patterns

  • Using max-heap for k largest without thinking — you usually want min-heap of size k for “top k largest.”
  • Storing full arrays in heap elements when a pair (priority, id) suffices.
  • Quickselect on “median of two sorted arrays” style problems where binary search on answer is the right abstraction.
  • Tie-breaking by object identity when the problem requires stable or deterministic ordering on equal keys.

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page