LeetCode 30 - Substring with Concatenation of All Words
The problem asks us to find all starting indices in a string s where a substring exists that is a concatenation of all the words in the array words exactly once, without any intervening characters. The words can appear in any order, and all words in words have the same length.
Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
The problem asks us to find all starting indices in a string s where a substring exists that is a concatenation of all the words in the array words exactly once, without any intervening characters. The words can appear in any order, and all words in words have the same length.
The input s is the string to search within, and words is an array of non-empty lowercase strings. The expected output is a list of integers representing the indices in s where such a concatenated substring begins.
The constraints tell us several important points. The length of s can go up to 10,000, while words can have up to 5,000 elements, each up to 30 characters long. This makes a naive solution that tries every permutation of words infeasible, as the factorial growth would be enormous. Important edge cases include situations where no concatenated substring exists, where words overlap in tricky ways, or where repeated words appear in words.
Approaches
The brute-force approach would attempt to generate all permutations of words and check for each permutation if it exists as a substring in s. This guarantees correctness but is computationally impractical because generating all permutations of words has time complexity O(m!), where m is the number of words. Checking each permutation in the string adds another factor of O(n), where n is the length of s.
The key insight for a better solution is to use a sliding window combined with a hash map. Since all words have the same length, we can check fixed-length segments in s and maintain a frequency count of seen words. By sliding over s in steps equal to the word length and checking the frequency map against the target map for words, we can efficiently determine if a valid concatenation exists without generating all permutations.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(m! * n) | O(m) | Generate all permutations of words and check each in s |
| Optimal | O(n * word_len) | O(m) | Sliding window using hash maps to track word frequency |
Algorithm Walkthrough
- Compute the length of each word (
word_len) and the total length of all words combined (concat_len). - Create a hash map
word_countthat maps each word to its frequency inwords. - Iterate through
swithword_lenpossible starting offsets to ensure all alignments are covered. - For each offset, use two pointers
leftandrightto define a sliding window and acurrent_counthash map to track words seen in the current window. - Move
rightin steps ofword_len. If the word atrightis inword_count, increment its count incurrent_count. If the count exceeds the required frequency, shrink the window fromleftuntil the count is valid. - If the window size matches
concat_len, recordleftas a starting index. - If the word at
rightis not inword_count, resetcurrent_countand movelefttoright + word_len. - Repeat until the entire string has been processed.
Why it works: The sliding window ensures that each substring of the appropriate length is checked exactly once, and the frequency maps guarantee that only valid permutations of words are considered. The window invariant ensures that all words in words appear exactly the correct number of times in the current window.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def findSubstring(self, s: str, words: List[str]) -> List[int]:
if not s or not words:
return []
word_len = len(words[0])
word_count = defaultdict(int)
for word in words:
word_count[word] += 1
total_len = word_len * len(words)
n = len(s)
result = []
for i in range(word_len):
left = i
current_count = defaultdict(int)
count = 0
for right in range(i, n - word_len + 1, word_len):
word = s[right:right + word_len]
if word in word_count:
current_count[word] += 1
count += 1
while current_count[word] > word_count[word]:
left_word = s[left:left + word_len]
current_count[left_word] -= 1
count -= 1
left += word_len
if count == len(words):
result.append(left)
else:
current_count.clear()
count = 0
left = right + word_len
return result
The Python implementation uses defaultdict to handle counting efficiently. The outer loop iterates over word_len starting positions to handle different alignments. The inner loop moves the right pointer in steps of word_len and adjusts the window using the left pointer as necessary. If a word not in words is encountered, the window is reset.
Go Solution
func findSubstring(s string, words []string) []int {
if len(s) == 0 || len(words) == 0 {
return []int{}
}
wordLen := len(words[0])
totalLen := wordLen * len(words)
wordCount := make(map[string]int)
for _, word := range words {
wordCount[word]++
}
result := []int{}
n := len(s)
for i := 0; i < wordLen; i++ {
left := i
currentCount := make(map[string]int)
count := 0
for right := i; right+wordLen <= n; right += wordLen {
word := s[right:right+wordLen]
if _, ok := wordCount[word]; ok {
currentCount[word]++
count++
for currentCount[word] > wordCount[word] {
leftWord := s[left:left+wordLen]
currentCount[leftWord]--
count--
left += wordLen
}
if count == len(words) {
result = append(result, left)
}
} else {
currentCount = make(map[string]int)
count = 0
left = right + wordLen
}
}
}
return result
}
In Go, we use native maps for word counts and slice operations for substring extraction. The logic mirrors the Python version, with care taken for map initialization and slice boundaries.
Worked Examples
Example 1: s = "barfoothefoobarman", words = ["foo","bar"]
| Step | left | right | word | current_count | result |
|---|---|---|---|---|---|
| 1 | 0 | 0 | "bar" | {"bar":1} | [] |
| 2 | 0 | 3 | "foo" | {"bar":1,"foo":1} | [0] |
| 3 | 6 | 6 | "the" | reset | [0] |
| 4 | 9 | 9 | "foo" | {"foo":1} | [0] |
| 5 | 9 | 12 | "bar" | {"foo":1,"bar":1} | [0,9] |
Example 2: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
All sliding windows fail to match counts exactly, so result is [].
Example 3: s = "barfoofoobarthefoobarman", words = ["bar","foo","the"]
Sliding window checks yield starting indices [6,9,12].
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * word_len) | Outer loop runs word_len times, inner loop scans the string in steps of word_len |
| Space | O(m) | Maps store counts for words, up to size of words |
This complexity is efficient enough for the input constraints. n can be 10,000 and word_len up to 30, giving roughly 300,000 operations per string scan.
Test Cases
# Provided examples
assert Solution().findSubstring("barfoothefoobarman", ["foo","bar"]) == [0,9] # simple two-word case
assert Solution().findSubstring("wordgoodgoodgoodbestword", ["word","good","best","word"]) == [] # no match
assert Solution().findSubstring("barfoofoobarthefoobarman", ["bar","foo","the"]) == [6,9,12] # three-word case
# Edge cases
assert Solution().findSubstring("", ["foo"]) == [] # empty string
assert Solution().findSubstring("foobar", []) == [] # empty words
assert Solution().findSubstring("aaa", ["a","a"]) == [0,1] # repeated word
assert Solution().findSubstring("abababab", ["ab","ba"]) == [0,2,4] # overlapping words
assert Solution().findSubstring("abc", ["a","b","c","d"]) == [] # words longer than string
| Test | Why |
|---|---|
| Provided examples | Validates typical cases |
| Empty string | Handles no-input |