49. Group Anagrams
Quick Identifier
- Topic: arrays-hashing
- Pattern: Hash Map Bucketing (Canonical Keys)
- Hint: Anagrams are identical if you abstract away their letter order. You need a way to mathematically prove two strings are made of the exact same ingredients. Generate a "signature" for each string and group strings with the same signature together in a Hash Map.
- Difficulty: Medium
- Companies: Amazon, Meta, Google, Bloomberg, Microsoft
- LeetCode: 49
Quick Sights
- Time Complexity:
O(m * n)wheremis the number of strings andnis the maximum length of a string. Generating the frequency array signature takes linear time relative to the string length. (If you use sorting for the signature, it becomesO(m * n log n)). - Space Complexity:
O(m * n). The hash map stores every string and its corresponding signature. - Core Mechanism: Iterate through the given strings. For each string, generate a canonical signature. A signature is a standard representation that will be identical for all anagrams (e.g., a count array of the 26 lowercase English letters). Use this signature as the
keyin a Hash Map, and append the original string to the array stored at thatkey.
Concept Explanation
As a senior engineer, whenever you hear the word "grouping" based on a fuzzy property (like anagrams, isomorphic shapes, or rotational symmetry), your brain should instantly translate that to Hash Map with Canonical Keys.
A "Canonical Key" is just a fancy term for a standardized fingerprint.
- The Problem with Raw Strings: "eat" and "tea" look completely different to a computer. If you use them as Hash Map keys, they go into different buckets.
- The Sorting Fingerprint: If you sort "eat", you get "aet". If you sort "tea", you get "aet". Now they match perfectly! This works, but sorting takes $O(n \log n)$ time per string.
- The Frequency Fingerprint: Since we only care about lowercase English letters, we don't need to sort. We just count the ingredients.
- "eat" has 1 'a', 1 'e', 1 't'.
- "tea" has 1 'a', 1 'e', 1 't'.
We can represent this as a 26-element array
[1, 0, 0, 0, 1, 0, ..., 1]. We convert this array into a string (e.g., joining with#) and use that as our Hash Map key. Generating this array takes strict $O(n)$ time, making it faster than sorting.
Diagram
This flowchart outlines the generation of the Canonical Key and the bucketing process.
Loading diagram…
Study Pattern (SDE3+)
- 0–3 min: Immediately establish the pattern: "Group items into a Hash Map using a canonical key." State that the naive canonical key is the sorted string ($O(m \cdot n \log n)$). Then, elevate your answer by proposing the 26-length frequency array as the key ($O(m \cdot n)$).
- Implementation pass: In languages like Go or Python, arrays/tuples can be used directly as dictionary keys. In JavaScript, arrays are objects and checked by reference, not value! You must serialize the array (e.g.,
counts.join('#')) into a primitive string before using it as a Map key. - 5 min extension: What if the strings contained arbitrary Unicode characters (emojis, kanji, etc.) instead of just lowercase English? The 26-length array fails. You would have to fall back to the $O(n \log n)$ sorting method, or build a custom frequency string (e.g.,
1⭐2🔴) and sort the unique characters to maintain determinism.
Approaches
- Brute force — Compare every pair of strings to see if they are anagrams. Time: $O(m^2 \cdot n)$. Extremely slow.
- Sorting Signature — Sort each string to create the key. Time: $O(m \cdot n \log n)$. Very common, easily accepted, but slightly suboptimal.
- Frequency Array Signature — Count the 26 letters to create the key. Time: $O(m \cdot n)$. (Always pick this)
Optimal Solution
Go
func groupAnagrams(strs []string) [][]string {
// Go allows arrays (not slices) to be used directly as map keys
// because arrays of fixed size are comparable by value.
buckets := make(map[[26]int][]string)
for _, word := range strs {
var counts [26]int
for _, char := range word {
counts[char-'a']++
}
// The array itself is the canonical key
buckets[counts] = append(buckets[counts], word)
}
result := make([][]string, 0, len(buckets))
for _, group := range buckets {
result = append(result, group)
}
return result
}JavaScript
function groupAnagrams(strs) {
const buckets = new Map();
for (const word of strs) {
const counts = new Array(26).fill(0);
for (let i = 0; i < word.length; i++) {
counts[word.charCodeAt(i) - 97]++;
}
// In JS, two different array instances are never equal.
// We MUST serialize the counts array to a primitive string to use it as a key.
// Using '#' ensures counts like '10' and '1' don't ambiguously merge.
const key = counts.join('#');
if (!buckets.has(key)) {
buckets.set(key, []);
}
buckets.get(key).push(word);
}
return Array.from(buckets.values());
}Walkthrough
Input: words = ["eat", "tea", "tan", "ate", "nat", "bat"]
| word | Canonical Key (Conceptual) | Action | buckets Map State |
|---|---|---|---|
"eat" | [a:1, e:1, t:1] | Create new bucket | { [a:1,e:1,t:1] -> ["eat"] } |
"tea" | [a:1, e:1, t:1] | Key exists, append | { [a:1,e:1,t:1] -> ["eat", "tea"] } |
"tan" | [a:1, n:1, t:1] | Create new bucket | { [a:1,e:1,t:1] -> ["eat", "tea"], [a:1,n:1,t:1] -> ["tan"] } |
"ate" | [a:1, e:1, t:1] | Key exists, append | { [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan"] } |
"nat" | [a:1, n:1, t:1] | Key exists, append | { [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan", "nat"] } |
"bat" | [a:1, b:1, t:1] | Create new bucket | { [a:1,e:1,t:1] -> ["eat", "tea", "ate"], [a:1,n:1,t:1] -> ["tan", "nat"], [a:1,b:1,t:1] -> ["bat"] } |
Output: [["eat", "tea", "ate"], ["tan", "nat"], ["bat"]]
Edge Cases
- Empty strings.
""converts to an array of 26 zeros. Multiple empty strings will correctly group together. - Single characters. Works perfectly.
- All strings are completely unique. Creates
mseparate buckets containing 1 string each.
Pitfalls
- In JavaScript/Python, using
counts.toString()without a delimiter (like#) can sometimes lead to ambiguous keys if numbers get large, though it's usually safe for small strings. Safe serialization iscounts.join('#'). - Forgetting that the input strings might be empty, and failing to handle the
0length case in your counting logic.
Similar Problems
- 242. Valid Anagram — The foundational problem. Comparing the frequency arrays of just two strings.
- 438. Find All Anagrams in a String — Applying the anagram signature concept to a sliding window.
Mind-Map Tags
#arrays #hashmap #strings #anagram #bucketing #canonical-key
Mark this page when you finish learning it.
Last updated on
Spotted something unclear or wrong on this page?