THN Interview Prep

49. Group Anagrams

Quick Identifier

  • Topic: arrays-hashing
  • Pattern: Hash Map Bucketing (Canonical Keys)
  • Hint: Anagrams are identical if you abstract away their letter order. You need a way to mathematically prove two strings are made of the exact same ingredients. Generate a "signature" for each string and group strings with the same signature together in a Hash Map.
  • Difficulty: Medium
  • Companies: Amazon, Meta, Google, Bloomberg, Microsoft
  • LeetCode: 49

Quick Sights

  • Time Complexity: O(m * n) where m is the number of strings and n is the maximum length of a string. Generating the frequency array signature takes linear time relative to the string length. (If you use sorting for the signature, it becomes O(m * n log n)).
  • Space Complexity: O(m * n). The hash map stores every string and its corresponding signature.
  • Core Mechanism: Iterate through the given strings. For each string, generate a canonical signature. A signature is a standard representation that will be identical for all anagrams (e.g., a count array of the 26 lowercase English letters). Use this signature as the key in a Hash Map, and append the original string to the array stored at that key.

Concept Explanation

As a senior engineer, whenever you hear the word "grouping" based on a fuzzy property (like anagrams, isomorphic shapes, or rotational symmetry), your brain should instantly translate that to Hash Map with Canonical Keys.

A "Canonical Key" is just a fancy term for a standardized fingerprint.

  1. The Problem with Raw Strings: "eat" and "tea" look completely different to a computer. If you use them as Hash Map keys, they go into different buckets.
  2. The Sorting Fingerprint: If you sort "eat", you get "aet". If you sort "tea", you get "aet". Now they match perfectly! This works, but sorting takes $O(n \log n)$ time per string.
  3. The Frequency Fingerprint: Since we only care about lowercase English letters, we don't need to sort. We just count the ingredients.
    • "eat" has 1 'a', 1 'e', 1 't'.
    • "tea" has 1 'a', 1 'e', 1 't'. We can represent this as a 26-element array [1, 0, 0, 0, 1, 0, ..., 1]. We convert this array into a string (e.g., joining with #) and use that as our Hash Map key. Generating this array takes strict $O(n)$ time, making it faster than sorting.

Diagram

This flowchart outlines the generation of the Canonical Key and the bucketing process.

Loading diagram…

Study Pattern (SDE3+)

  • 0–3 min: Immediately establish the pattern: "Group items into a Hash Map using a canonical key." State that the naive canonical key is the sorted string ($O(m \cdot n \log n)$). Then, elevate your answer by proposing the 26-length frequency array as the key ($O(m \cdot n)$).
  • Implementation pass: In languages like Go or Python, arrays/tuples can be used directly as dictionary keys. In JavaScript, arrays are objects and checked by reference, not value! You must serialize the array (e.g., counts.join('#')) into a primitive string before using it as a Map key.
  • 5 min extension: What if the strings contained arbitrary Unicode characters (emojis, kanji, etc.) instead of just lowercase English? The 26-length array fails. You would have to fall back to the $O(n \log n)$ sorting method, or build a custom frequency string (e.g., 1⭐2🔴) and sort the unique characters to maintain determinism.

Approaches

  • Brute force — Compare every pair of strings to see if they are anagrams. Time: $O(m^2 \cdot n)$. Extremely slow.
  • Sorting Signature — Sort each string to create the key. Time: $O(m \cdot n \log n)$. Very common, easily accepted, but slightly suboptimal.
  • Frequency Array Signature — Count the 26 letters to create the key. Time: $O(m \cdot n)$. (Always pick this)

Optimal Solution

Go

func groupAnagrams(strs []string) [][]string {
	// Go allows arrays (not slices) to be used directly as map keys
	// because arrays of fixed size are comparable by value.
	buckets := make(map[[26]int][]string)

	for _, word := range strs {
		var counts [26]int

		for _, char := range word {
			counts[char-'a']++
		}

		// The array itself is the canonical key
		buckets[counts] = append(buckets[counts], word)
	}

	result := make([][]string, 0, len(buckets))
	for _, group := range buckets {
		result = append(result, group)
	}

	return result
}

JavaScript

function groupAnagrams(strs) {
	const buckets = new Map();

	for (const word of strs) {
		const counts = new Array(26).fill(0);

		for (let i = 0; i < word.length; i++) {
			counts[word.charCodeAt(i) - 97]++;
		}

		// In JS, two different array instances are never equal.
		// We MUST serialize the counts array to a primitive string to use it as a key.
		// Using '#' ensures counts like '10' and '1' don't ambiguously merge.
		const key = counts.join('#');

		if (!buckets.has(key)) {
			buckets.set(key, []);
		}
		buckets.get(key).push(word);
	}

	return Array.from(buckets.values());
}

Walkthrough

Input: words = ["eat", "tea", "tan", "ate", "nat", "bat"]

wordCanonical Key (Conceptual)Actionbuckets Map State
"eat"[a:1, e:1, t:1]Create new bucket{ [a:1,e:1,t:1] -> ["eat"] }
"tea"[a:1, e:1, t:1]Key exists, append{ [a:1,e:1,t:1] -> ["eat", "tea"] }
"tan"[a:1, n:1, t:1]Create new bucket{ [a:1,e:1,t:1] -> ["eat", "tea"], [a:1,n:1,t:1] -> ["tan"] }
"ate"[a:1, e:1, t:1]Key exists, append{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan"] }
"nat"[a:1, n:1, t:1]Key exists, append{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan", "nat"] }
"bat"[a:1, b:1, t:1]Create new bucket{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan", "nat"], [a:1,b:1,t:1] -> ["bat"] }

Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

Edge Cases

  • Empty strings. "" converts to an array of 26 zeros. Multiple empty strings will correctly group together.
  • Single characters. Works perfectly.
  • All strings are completely unique. Creates m separate buckets containing 1 string each.

Pitfalls

  • In JavaScript/Python, using counts.toString() without a delimiter (like #) can sometimes lead to ambiguous keys if numbers get large, though it's usually safe for small strings. Safe serialization is counts.join('#').
  • Forgetting that the input strings might be empty, and failing to handle the 0 length case in your counting logic.

Similar Problems

Mind-Map Tags

#arrays #hashmap #strings #anagram #bucketing #canonical-key

Mark this page when you finish learning it.

Last updated on

Spotted something unclear or wrong on this page?

On this page