49. Group Anagrams

Quick Identifier

Topic: arrays-hashing
Pattern: Hash Map Bucketing (Canonical Keys)
Hint: Anagrams are identical if you abstract away their letter order. You need a way to mathematically prove two strings are made of the exact same ingredients. Generate a "signature" for each string and group strings with the same signature together in a Hash Map.
Difficulty: Medium
Companies: Amazon, Meta, Google, Bloomberg, Microsoft
LeetCode: 49

Quick Sights

Time Complexity: O(m * n) where m is the number of strings and n is the maximum length of a string. Generating the frequency array signature takes linear time relative to the string length. (If you use sorting for the signature, it becomes O(m * n log n)).
Space Complexity: O(m * n). The hash map stores every string and its corresponding signature.
Core Mechanism: Iterate through the given strings. For each string, generate a canonical signature. A signature is a standard representation that will be identical for all anagrams (e.g., a count array of the 26 lowercase English letters). Use this signature as the key in a Hash Map, and append the original string to the array stored at that key.

As a senior engineer, whenever you hear the word "grouping" based on a fuzzy property (like anagrams, isomorphic shapes, or rotational symmetry), your brain should instantly translate that to Hash Map with Canonical Keys.

A "Canonical Key" is just a fancy term for a standardized fingerprint.

The Problem with Raw Strings: "eat" and "tea" look completely different to a computer. If you use them as Hash Map keys, they go into different buckets.
The Sorting Fingerprint: If you sort "eat", you get "aet". If you sort "tea", you get "aet". Now they match perfectly! This works, but sorting takes $O(n \log n)$ time per string.
The Frequency Fingerprint: Since we only care about lowercase English letters, we don't need to sort. We just count the ingredients.
- "eat" has 1 'a', 1 'e', 1 't'.
- "tea" has 1 'a', 1 'e', 1 't'. We can represent this as a 26-element array [1, 0, 0, 0, 1, 0, ..., 1]. We convert this array into a string (e.g., joining with #) and use that as our Hash Map key. Generating this array takes strict $O(n)$ time, making it faster than sorting.

Diagram

This flowchart outlines the generation of the Canonical Key and the bucketing process.

Loading diagram…

Study Pattern (SDE3+)

0–3 min: Immediately establish the pattern: "Group items into a Hash Map using a canonical key." State that the naive canonical key is the sorted string ($O(m \cdot n \log n)$). Then, elevate your answer by proposing the 26-length frequency array as the key ($O(m \cdot n)$).
Implementation pass: In languages like Go or Python, arrays/tuples can be used directly as dictionary keys. In JavaScript, arrays are objects and checked by reference, not value! You must serialize the array (e.g., counts.join('#')) into a primitive string before using it as a Map key.
5 min extension: What if the strings contained arbitrary Unicode characters (emojis, kanji, etc.) instead of just lowercase English? The 26-length array fails. You would have to fall back to the $O(n \log n)$ sorting method, or build a custom frequency string (e.g., 1⭐2🔴) and sort the unique characters to maintain determinism.

Approaches

Brute force — Compare every pair of strings to see if they are anagrams. Time: $O(m^2 \cdot n)$. Extremely slow.
Sorting Signature — Sort each string to create the key. Time: $O(m \cdot n \log n)$. Very common, easily accepted, but slightly suboptimal.
Frequency Array Signature — Count the 26 letters to create the key. Time: $O(m \cdot n)$. (Always pick this)

Optimal Solution

Go

func groupAnagrams(strs []string) [][]string {
	// Go allows arrays (not slices) to be used directly as map keys
	// because arrays of fixed size are comparable by value.
	buckets := make(map[[26]int][]string)

	for _, word := range strs {
		var counts [26]int

		for _, char := range word {
			counts[char-'a']++
		}

		// The array itself is the canonical key
		buckets[counts] = append(buckets[counts], word)
	}

	result := make([][]string, 0, len(buckets))
	for _, group := range buckets {
		result = append(result, group)
	}

	return result
}

JavaScript

function groupAnagrams(strs) {
	const buckets = new Map();

	for (const word of strs) {
		const counts = new Array(26).fill(0);

		for (let i = 0; i < word.length; i++) {
			counts[word.charCodeAt(i) - 97]++;
		}

		// In JS, two different array instances are never equal.
		// We MUST serialize the counts array to a primitive string to use it as a key.
		// Using '#' ensures counts like '10' and '1' don't ambiguously merge.
		const key = counts.join('#');

		if (!buckets.has(key)) {
			buckets.set(key, []);
		}
		buckets.get(key).push(word);
	}

	return Array.from(buckets.values());
}

Walkthrough

Input: words = ["eat", "tea", "tan", "ate", "nat", "bat"]

word	Canonical Key (Conceptual)	Action	`buckets` Map State
`"eat"`	`[a:1, e:1, t:1]`	Create new bucket	`{ [a:1,e:1,t:1] -> ["eat"] }`
`"tea"`	`[a:1, e:1, t:1]`	Key exists, append	`{ [a:1,e:1,t:1] -> ["eat", "tea"] }`
`"tan"`	`[a:1, n:1, t:1]`	Create new bucket	`{ [a:1,e:1,t:1] -> ["eat", "tea"],` `[a:1,n:1,t:1] -> ["tan"] }`
`"ate"`	`[a:1, e:1, t:1]`	Key exists, append	`{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"],` `[a:1,n:1,t:1] -> ["tan"] }`
`"nat"`	`[a:1, n:1, t:1]`	Key exists, append	`{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"],` `[a:1,n:1,t:1] -> ["tan", "nat"] }`
`"bat"`	`[a:1, b:1, t:1]`	Create new bucket	`{ [a:1,e:1,t:1] -> ["eat", "tea", "ate"],` `[a:1,n:1,t:1] -> ["tan", "nat"],` `[a:1,b:1,t:1] -> ["bat"] }`

Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]

Edge Cases

Empty strings. "" converts to an array of 26 zeros. Multiple empty strings will correctly group together.
Single characters. Works perfectly.
All strings are completely unique. Creates m separate buckets containing 1 string each.

Pitfalls

In JavaScript/Python, using counts.toString() without a delimiter (like #) can sometimes lead to ambiguous keys if numbers get large, though it's usually safe for small strings. Safe serialization is counts.join('#').
Forgetting that the input strings might be empty, and failing to handle the 0 length case in your counting logic.

Mind-Map Tags

#arrays #hashmap #strings #anagram #bucketing #canonical-key