THN Interview Prep

13. Union-Find (DSU)

TL;DR

A disjoint-set union structure maintains a partition of items into components and supports Find (which set?) and Union (merge two sets). With path compression on Find and union by rank (or size), each operation is amortized inverse-Ackermann α(n), effectively constant. Use DSU for dynamic connectivity, cycle detection in undirected graphs, Kruskal’s MST, and redundant edge problems.

Core Basics

  • Object: disjoint sets with near-constant union and find via rank/size + path compression.
  • Use when: dynamic connectivity, Kruskal edges, redundant connection detection.
  • Staff-level bar: articulate amortized (alpha(n)) casually and when rollback DSU is needed.

Recognition Cues

  • Phrases: “connected components,” “union edges,” “redundant connection,” “are a and b in same group?”, “minimum spanning tree” with sorted edges.
  • Input: n nodes, undirected edges or union queries.
  • Output: component count, whether graph stays acyclic, MST weight, or the redundant edge.

Use-case Lens

  • Best when: you process edges/relationships and repeatedly ask whether two items are in the same connected component.
  • Not for: shortest paths, traversal order, or components that need full member lists after every operation.
  • Main invariant: each set has one representative root; path compression and union by rank keep trees shallow.
  • Interview move: distinguish static graph traversal from dynamic connectivity as edges arrive.

Diagram

Loading diagram…

Step-by-step Visual

Example: Detect a redundant edge in [[1,2], [2,3], [1,3]]. Union-Find tracks connected components.

Loading diagram…

Understanding

Each set is a tree whose root is the “representative.” Find walks to the root while flattening parents (path compression). Union attaches the shallower tree under the deeper root (by rank). Amortized α(n) grows so slowly it is < 5 for all realistic n.

Generic Recipe

  1. Initialize parent[index] = index, rank[index] = 0.
  2. Find: while parent[x] != x, set parent[x] = parent[parent[x]] (or full path compression at end).
  3. Union: rootA = Find(a), rootB = Find(b); if equal, skip (already connected — often cycle if adding edge). Else link smaller rank under larger; if ranks tie, increment new root’s rank.
  4. For Kruskal: sort edges by weight; Union if roots differ; accumulate weight.
  5. For components count: after unions, count roots where parent[i] == i.

Complexity

  • Time: O(α(n)) amortized per Find / Union; Kruskal O(E log E) from sorting edges.
  • Space: O(n) for parent and rank arrays.

Memory Hooks

  • Operational mantra: find roots, union different roots, and let "same root" answer connectivity or cycle questions.
  • Anti-panic: every edge either merges two components or proves they were already connected; that is the invariant.

Study Pattern (SDE3+)

  • Recognition drill (10 min): tag prompts as component count, redundant edge, connectivity query, or Kruskal/MST.
  • Implementation sprint (25 min): code DSU with path compression and union by rank/size; track component count.
  • Staff follow-up (10 min): discuss deletions, rollback DSU, coordinate compression, and why DFS may be simpler for static graphs.

Generic Code Template

Go

type UnionFind struct {
	parent []int
	rank   []int
}

func NewUnionFind(size int) *UnionFind {
	parent := make([]int, size)
	rank := make([]int, size)
	for index := 0; index < size; index++ {
		parent[index] = index
	}
	return &UnionFind{parent: parent, rank: rank}
}

func (unionFind *UnionFind) Find(node int) int {
	if unionFind.parent[node] != node {
		unionFind.parent[node] = unionFind.Find(unionFind.parent[node])
	}
	return unionFind.parent[node]
}

func (unionFind *UnionFind) Union(first int, second int) bool {
	rootFirst := unionFind.Find(first)
	rootSecond := unionFind.Find(second)
	if rootFirst == rootSecond {
		return false
	}
	if unionFind.rank[rootFirst] < unionFind.rank[rootSecond] {
		unionFind.parent[rootFirst] = rootSecond
	} else if unionFind.rank[rootFirst] > unionFind.rank[rootSecond] {
		unionFind.parent[rootSecond] = rootFirst
	} else {
		unionFind.parent[rootSecond] = rootFirst
		unionFind.rank[rootFirst]++
	}
	return true
}

func (unionFind *UnionFind) Connected(first int, second int) bool {
	return unionFind.Find(first) == unionFind.Find(second)
}

JavaScript

class UnionFind {
  constructor(size) {
    this.parent = Array.from({ length: size }, (_, index) => index);
    this.rank = Array(size).fill(0);
  }

  find(node) {
    if (this.parent[node] !== node) {
      this.parent[node] = this.find(this.parent[node]);
    }
    return this.parent[node];
  }

  union(first, second) {
    let rootFirst = this.find(first);
    let rootSecond = this.find(second);
    if (rootFirst === rootSecond) return false;
    if (this.rank[rootFirst] < this.rank[rootSecond]) {
      this.parent[rootFirst] = rootSecond;
    } else if (this.rank[rootFirst] > this.rank[rootSecond]) {
      this.parent[rootSecond] = rootFirst;
    } else {
      this.parent[rootSecond] = rootFirst;
      this.rank[rootFirst] += 1;
    }
    return true;
  }

  connected(first, second) {
    return this.find(first) === this.find(second);
  }
}

Variants

  • Connected components count: start with n, decrement Union when merging distinct roots.
  • Detect cycle (undirected): on each new edge, Union returns false ⇒ cycle closed.
  • Kruskal MST: DSU on vertices; process edges ascending weight.
  • Redundant connection: last edge that joins already-connected endpoints (or variant II with directed roots).

Representative Problems

Anti-patterns

  • Union without checking roots first — wrong merges or double counting.
  • Using DSU for directed cycle without adapting (use topo sort or specialized algorithms).
  • Path compression without union by rank/size — still works but weaker guarantees in theory.
  • Kruskal without sorting edges — breaks MST optimality.

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page