LeetCode 30 - Substring with Concatenation of All Words

The problem asks us to find all starting indices in a string s where a substring exists that is a concatenation of all the words in the array words exactly once, without any intervening characters. The words can appear in any order, and all words in words have the same length.

LeetCode Problem 30

Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window

Solution

Problem Understanding

The problem asks us to find all starting indices in a string s where a substring exists that is a concatenation of all the words in the array words exactly once, without any intervening characters. The words can appear in any order, and all words in words have the same length.

The input s is the string to search within, and words is an array of non-empty lowercase strings. The expected output is a list of integers representing the indices in s where such a concatenated substring begins.

The constraints tell us several important points. The length of s can go up to 10,000, while words can have up to 5,000 elements, each up to 30 characters long. This makes a naive solution that tries every permutation of words infeasible, as the factorial growth would be enormous. Important edge cases include situations where no concatenated substring exists, where words overlap in tricky ways, or where repeated words appear in words.

Approaches

The brute-force approach would attempt to generate all permutations of words and check for each permutation if it exists as a substring in s. This guarantees correctness but is computationally impractical because generating all permutations of words has time complexity O(m!), where m is the number of words. Checking each permutation in the string adds another factor of O(n), where n is the length of s.

The key insight for a better solution is to use a sliding window combined with a hash map. Since all words have the same length, we can check fixed-length segments in s and maintain a frequency count of seen words. By sliding over s in steps equal to the word length and checking the frequency map against the target map for words, we can efficiently determine if a valid concatenation exists without generating all permutations.

Approach Time Complexity Space Complexity Notes
Brute Force O(m! * n) O(m) Generate all permutations of words and check each in s
Optimal O(n * word_len) O(m) Sliding window using hash maps to track word frequency

Algorithm Walkthrough

  1. Compute the length of each word (word_len) and the total length of all words combined (concat_len).
  2. Create a hash map word_count that maps each word to its frequency in words.
  3. Iterate through s with word_len possible starting offsets to ensure all alignments are covered.
  4. For each offset, use two pointers left and right to define a sliding window and a current_count hash map to track words seen in the current window.
  5. Move right in steps of word_len. If the word at right is in word_count, increment its count in current_count. If the count exceeds the required frequency, shrink the window from left until the count is valid.
  6. If the window size matches concat_len, record left as a starting index.
  7. If the word at right is not in word_count, reset current_count and move left to right + word_len.
  8. Repeat until the entire string has been processed.

Why it works: The sliding window ensures that each substring of the appropriate length is checked exactly once, and the frequency maps guarantee that only valid permutations of words are considered. The window invariant ensures that all words in words appear exactly the correct number of times in the current window.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def findSubstring(self, s: str, words: List[str]) -> List[int]:
        if not s or not words:
            return []
        
        word_len = len(words[0])
        word_count = defaultdict(int)
        for word in words:
            word_count[word] += 1
        
        total_len = word_len * len(words)
        n = len(s)
        result = []
        
        for i in range(word_len):
            left = i
            current_count = defaultdict(int)
            count = 0
            for right in range(i, n - word_len + 1, word_len):
                word = s[right:right + word_len]
                if word in word_count:
                    current_count[word] += 1
                    count += 1
                    while current_count[word] > word_count[word]:
                        left_word = s[left:left + word_len]
                        current_count[left_word] -= 1
                        count -= 1
                        left += word_len
                    if count == len(words):
                        result.append(left)
                else:
                    current_count.clear()
                    count = 0
                    left = right + word_len
        return result

The Python implementation uses defaultdict to handle counting efficiently. The outer loop iterates over word_len starting positions to handle different alignments. The inner loop moves the right pointer in steps of word_len and adjusts the window using the left pointer as necessary. If a word not in words is encountered, the window is reset.

Go Solution

func findSubstring(s string, words []string) []int {
    if len(s) == 0 || len(words) == 0 {
        return []int{}
    }

    wordLen := len(words[0])
    totalLen := wordLen * len(words)
    wordCount := make(map[string]int)
    for _, word := range words {
        wordCount[word]++
    }

    result := []int{}
    n := len(s)

    for i := 0; i < wordLen; i++ {
        left := i
        currentCount := make(map[string]int)
        count := 0
        for right := i; right+wordLen <= n; right += wordLen {
            word := s[right:right+wordLen]
            if _, ok := wordCount[word]; ok {
                currentCount[word]++
                count++
                for currentCount[word] > wordCount[word] {
                    leftWord := s[left:left+wordLen]
                    currentCount[leftWord]--
                    count--
                    left += wordLen
                }
                if count == len(words) {
                    result = append(result, left)
                }
            } else {
                currentCount = make(map[string]int)
                count = 0
                left = right + wordLen
            }
        }
    }
    return result
}

In Go, we use native maps for word counts and slice operations for substring extraction. The logic mirrors the Python version, with care taken for map initialization and slice boundaries.

Worked Examples

Example 1: s = "barfoothefoobarman", words = ["foo","bar"]

Step left right word current_count result
1 0 0 "bar" {"bar":1} []
2 0 3 "foo" {"bar":1,"foo":1} [0]
3 6 6 "the" reset [0]
4 9 9 "foo" {"foo":1} [0]
5 9 12 "bar" {"foo":1,"bar":1} [0,9]

Example 2: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]

All sliding windows fail to match counts exactly, so result is [].

Example 3: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]

Sliding window checks yield starting indices [6,9,12].

Complexity Analysis

Measure Complexity Explanation
Time O(n * word_len) Outer loop runs word_len times, inner loop scans the string in steps of word_len
Space O(m) Maps store counts for words, up to size of words

This complexity is efficient enough for the input constraints. n can be 10,000 and word_len up to 30, giving roughly 300,000 operations per string scan.

Test Cases

# Provided examples
assert Solution().findSubstring("barfoothefoobarman", ["foo","bar"]) == [0,9]  # simple two-word case
assert Solution().findSubstring("wordgoodgoodgoodbestword", ["word","good","best","word"]) == []  # no match
assert Solution().findSubstring("barfoofoobarthefoobarman", ["bar","foo","the"]) == [6,9,12]  # three-word case

# Edge cases
assert Solution().findSubstring("", ["foo"]) == []  # empty string
assert Solution().findSubstring("foobar", []) == []  # empty words
assert Solution().findSubstring("aaa", ["a","a"]) == [0,1]  # repeated word
assert Solution().findSubstring("abababab", ["ab","ba"]) == [0,2,4]  # overlapping words
assert Solution().findSubstring("abc", ["a","b","c","d"]) == []  # words longer than string
Test Why
Provided examples Validates typical cases
Empty string Handles no-input