030. Substring with Concatenation of All Words

Quick Identifier

Topic: sliding-window
Pattern: Sliding Window (Fixed multi‑unit window + hash map)
Hint: All words share the same length. This enables a "chunked" sliding window stepping by wordLength instead of a single character.
Difficulty: Hard
Companies: Amazon, Google, Meta, Microsoft, Bloomberg
LeetCode: 30

Quick Sights

Time Complexity: O(n) total across all offset passes. Each character block is visited at most twice.
Space Complexity: O(m) where m is the number of words. We store two hash maps: the expected frequency map and the current window map.
Core Mechanism: Run L independent window passes (L = wordLength). In each pass, slide the window block‑by‑block, maintaining a frequency count of seen words and contracting when a word exceeds its allowed count.

Concept Explanation

As a senior engineer, you recognize that the uniform word length constraint collapses a daunting string problem into a discrete block scanning problem.

Offset Scheduling: Because a valid concatenation can start at any character index, there are exactly L possible alignments. We launch L parallel passes, each starting at offsets 0 … L‑1.
Block Window: Within a single offset, the problem reduces to finding an anagram of the words list where the units are words instead of characters — essentially the same logic as LeetCode 438 but with word tokens.
Expansion & Contraction: Expand the right pointer by L, read the next word, and update the window map. If the count of that word exceeds the target, shrink from the left until the invariants hold again.
Match Detection: When the total number of matched words equals numWords, record the left index as a valid start.

Diagram

Loading diagram…

Study Pattern (SDE3+)

0‑3 min: State the uniform length insight and why it enables L independent scans.
Implementation pass: Walk through the outer offset loop, then the inner slide‑by‑L loop, highlighting the frequency‑map updates and the contraction logic.
5 min extension: Discuss how the solution would change if words had variable lengths (e.g., requiring a trie or Aho‑Corasick automaton).

Approaches

Brute force – enumerate every start index, slice numWords * L characters, and compare multisets. O(n * m * L).
Optimal – L offset passes with a sliding window and hashmap. O(n) time, O(m) space. (Always pick this)

Optimal Solution

Go

func findSubstring(s string, words []string) []int {
    if len(words) == 0 {
        return nil
    }
    wordLen := len(words[0])
    numWords := len(words)
    n := len(s)

    // Build expected frequency map
    expected := make(map[string]int)
    for _, w := range words {
        expected[w]++
    }

    result := []int{}

    // Iterate over each possible offset
    for offset := 0; offset < wordLen; offset++ {
        left := offset
        cur := make(map[string]int)
        matched := 0
        for right := offset + wordLen; right <= n; right += wordLen {
            piece := s[right-wordLen : right]
            if expected[piece] == 0 { // not a target word
                cur = make(map[string]int)
                matched = 0
                left = right
                continue
            }
            cur[piece]++
            matched++
            // Shrink window if we have too many of this word
            for cur[piece] > expected[piece] {
                leftWord := s[left : left+wordLen]
                cur[leftWord]--
                matched--
                left += wordLen
            }
            if matched == numWords {
                result = append(result, left)
            }
        }
    }
    return result
}

JavaScript

function findSubstring(s, words) {
    if (words.length === 0) return [];
    const wordLen = words[0].length;
    const numWords = words.length;
    const n = s.length;
    const expected = new Map();
    for (const w of words) expected.set(w, (expected.get(w) || 0) + 1);
    const result = [];

    for (let offset = 0; offset < wordLen; offset++) {
        let left = offset;
        const cur = new Map();
        let matched = 0;
        for (let right = offset + wordLen; right <= n; right += wordLen) {
            const piece = s.slice(right - wordLen, right);
            if (!expected.has(piece)) {
                cur.clear();
                matched = 0;
                left = right;
                continue;
            }
            cur.set(piece, (cur.get(piece) || 0) + 1);
            matched++;
            while (cur.get(piece) > expected.get(piece)) {
                const leftWord = s.slice(left, left + wordLen);
                cur.set(leftWord, cur.get(leftWord) - 1);
                matched--;
                left += wordLen;
            }
            if (matched === numWords) result.push(left);
        }
    }
    return result;
}

Walkthrough

Input: s = "barfoothefoobarman", words = ["foo","bar"]

wordLen = 3, numWords = 2
Offsets 0‑2 are examined.
Offset 0 yields matches at indices 0 and 9.
Offsets 1 and 2 produce no matches because the word boundaries mis‑align.
Final output: [0, 9].

Edge Cases

words contains duplicates → frequency map handles counts correctly.
Empty words list → return [].
s shorter than wordLen * numWords → early exit, no matches.

Pitfalls

Forgetting to reset the window when encountering a non‑target word (must move left to right).
Not advancing left enough when a word exceeds its allowed frequency, leading to stale counts.
Using the wrong slice boundaries (right‑wordLen vs right).

Mind‑Map Tags

#sliding-window #fixed-window #hashmap #word‑token #hard