LeetCode 438 - Find All Anagrams in a String
The problem gives two strings, s and p, both containing only lowercase English letters. We need to find every starting index in s where a substring is an anagram of p.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
The problem gives two strings, s and p, both containing only lowercase English letters. We need to find every starting index in s where a substring is an anagram of p.
An anagram means the substring contains exactly the same characters as p, with the same frequencies, but possibly in a different order. For example, "abc", "bca", and "cab" are all anagrams of each other.
The key observation is that every valid substring must have the same length as p. If p has length 3, then every candidate substring in s must also have length 3. We slide a window of size len(p) across s and check whether the current window is an anagram of p.
The output should contain all starting indices where such an anagram occurs.
The constraints are important:
1 <= s.length, p.length <= 3 * 10^4- Strings contain only lowercase English letters
A brute-force solution that sorts every substring or rebuilds frequency maps repeatedly would be too slow at this scale. Since the alphabet is fixed to only 26 lowercase letters, we can use frequency counting very efficiently.
Several edge cases are important:
pmay be longer thans, in which case no answer exists.- Multiple anagrams may overlap, such as
"abab"with"ab". - Repeated characters in
pcan create subtle counting bugs, such as"aa"inside"aaaaa". - Every character frequency must match exactly, not just the set of characters.
Approaches
Brute Force Approach
The straightforward approach is to examine every substring of length len(p) inside s.
For each substring:
- Extract the substring.
- Count character frequencies or sort the substring.
- Compare it with
p.
If the frequencies match, record the starting index.
This works because an anagram is defined by matching character frequencies. However, it is inefficient because we repeatedly rebuild information for overlapping substrings.
If we sort each substring, the complexity becomes expensive. Even with frequency counting, rebuilding counts from scratch for every window wastes work because adjacent windows differ by only one character.
With up to 3 * 10^4 characters, we need a more efficient strategy.
Optimal Sliding Window Approach
The important insight is that consecutive substrings of the same length overlap heavily.
Suppose the current window is:
cba
and the next window is:
bae
Instead of recomputing frequencies from scratch, we can:
- Remove the leftmost character (
c) - Add the new rightmost character (
e)
This naturally leads to a sliding window algorithm.
We maintain:
- A frequency count for
p - A frequency count for the current window in
s
As the window slides:
- Add the incoming character
- Remove the outgoing character
- Compare frequency arrays
Since there are only 26 lowercase English letters, frequency comparison is effectively constant time.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * m) | O(1) | Rebuilds frequency counts for every substring |
| Optimal | O(n) | O(1) | Sliding window updates counts incrementally |
Here:
n = len(s)m = len(p)
Algorithm Walkthrough
- First, check whether
pis longer thans.
If len(p) > len(s), it is impossible for any substring in s to match p, so return an empty list immediately.
2. Create two frequency arrays of size 26.
One array stores character frequencies for p.
The other array stores character frequencies for the current sliding window in s.
Since the strings contain only lowercase English letters, we map:
'a' -> 0
'b' -> 1
...
'z' -> 25
- Populate the frequency arrays for the first window.
Count characters in:
p- The first
len(p)characters ofs
- Compare the two frequency arrays.
If they match, then the first window is an anagram, so add index 0 to the answer.
5. Begin sliding the window one character at a time.
For each new position:
- Add the new rightmost character to the window count.
- Remove the old leftmost character from the window count.
This updates the window efficiently in constant time. 6. After each update, compare the window frequency array with the target frequency array.
If they match, the current window is an anagram, so record the window's starting index. 7. Continue until the window reaches the end of the string. 8. Return the collected indices.
Why it works
The algorithm maintains an invariant:
- The window frequency array always represents the exact character counts of the current substring of length
len(p).
Because two strings are anagrams if and only if their character frequencies match, every time the window frequency equals the target frequency, the current substring must be an anagram of p.
The sliding window guarantees that every possible substring of the required length is checked exactly once.
Python Solution
from typing import List
class Solution:
def findAnagrams(self, s: str, p: str) -> List[int]:
if len(p) > len(s):
return []
result = []
p_count = [0] * 26
window_count = [0] * 26
for i in range(len(p)):
p_count[ord(p[i]) - ord('a')] += 1
window_count[ord(s[i]) - ord('a')] += 1
if p_count == window_count:
result.append(0)
left = 0
for right in range(len(p), len(s)):
window_count[ord(s[right]) - ord('a')] += 1
window_count[ord(s[left]) - ord('a')] -= 1
left += 1
if p_count == window_count:
result.append(left)
return result
The implementation starts with an early return for the case where p is longer than s.
Two arrays of length 26 are used to store character frequencies. Arrays are chosen instead of dictionaries because the character set is fixed and small, making array access faster and simpler.
The first loop initializes the frequency counts for both p and the first window in s.
After initialization, the algorithm checks whether the first window already forms an anagram.
The sliding window then moves across the string. At each step:
- One character enters the window.
- One character leaves the window.
This keeps the window size fixed at len(p).
The frequency arrays are compared after each update. Whenever they match, the current window is a valid anagram and its starting index is added to the result.
Go Solution
func findAnagrams(s string, p string) []int {
if len(p) > len(s) {
return []int{}
}
result := []int{}
pCount := make([]int, 26)
windowCount := make([]int, 26)
for i := 0; i < len(p); i++ {
pCount[p[i]-'a']++
windowCount[s[i]-'a']++
}
if equalCounts(pCount, windowCount) {
result = append(result, 0)
}
left := 0
for right := len(p); right < len(s); right++ {
windowCount[s[right]-'a']++
windowCount[s[left]-'a']--
left++
if equalCounts(pCount, windowCount) {
result = append(result, left)
}
}
return result
}
func equalCounts(a []int, b []int) bool {
for i := 0; i < 26; i++ {
if a[i] != b[i] {
return false
}
}
return true
}
The Go implementation follows the same algorithmic structure as the Python version.
One notable difference is that Go does not support direct slice equality comparison for integer slices, so a helper function equalCounts is used to compare the two frequency arrays manually.
Go strings are byte-indexable, which works perfectly here because the input contains only lowercase English letters.
The function returns an empty slice when no matches exist, which is the idiomatic Go behavior.
Worked Examples
Example 1
s = "cbaebabacd"
p = "abc"
Target frequency:
| Character | Count |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
Window size is 3.
| Window | Indices | Window Count Matches? | Result |
|---|---|---|---|
| "cba" | 0-2 | Yes | [0] |
| "bae" | 1-3 | No | [0] |
| "aeb" | 2-4 | No | [0] |
| "eba" | 3-5 | No | [0] |
| "bab" | 4-6 | No | [0] |
| "aba" | 5-7 | No | [0] |
| "bac" | 6-8 | Yes | [0, 6] |
| "acd" | 7-9 | No | [0, 6] |
Final answer:
[0, 6]
Example 2
s = "abab"
p = "ab"
Target frequency:
| Character | Count |
|---|---|
| a | 1 |
| b | 1 |
Window size is 2.
| Window | Indices | Window Count Matches? | Result |
|---|---|---|---|
| "ab" | 0-1 | Yes | [0] |
| "ba" | 1-2 | Yes | [0, 1] |
| "ab" | 2-3 | Yes | [0, 1, 2] |
Final answer:
[0, 1, 2]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window once |
| Space | O(1) | Frequency arrays always contain 26 elements |
The algorithm processes the string in a single pass using a fixed-size sliding window.
Although frequency arrays are compared repeatedly, the comparison cost is constant because the arrays always contain exactly 26 elements.
Therefore, the overall runtime is linear in the size of s.
The space usage is constant because the frequency arrays never grow with input size.
Test Cases
from typing import List
class Solution:
def findAnagrams(self, s: str, p: str) -> List[int]:
if len(p) > len(s):
return []
result = []
p_count = [0] * 26
window_count = [0] * 26
for i in range(len(p)):
p_count[ord(p[i]) - ord('a')] += 1
window_count[ord(s[i]) - ord('a')] += 1
if p_count == window_count:
result.append(0)
left = 0
for right in range(len(p), len(s)):
window_count[ord(s[right]) - ord('a')] += 1
window_count[ord(s[left]) - ord('a')] -= 1
left += 1
if p_count == window_count:
result.append(left)
return result
solution = Solution()
assert solution.findAnagrams("cbaebabacd", "abc") == [0, 6] # basic example
assert solution.findAnagrams("abab", "ab") == [0, 1, 2] # overlapping matches
assert solution.findAnagrams("aaaaa", "aa") == [0, 1, 2, 3] # repeated characters
assert solution.findAnagrams("abcdef", "gh") == [] # no matches
assert solution.findAnagrams("a", "a") == [0] # single character match
assert solution.findAnagrams("a", "b") == [] # single character no match
assert solution.findAnagrams("ab", "abc") == [] # pattern longer than string
assert solution.findAnagrams("baa", "aa") == [1] # match at end
assert solution.findAnagrams("aaabaaa", "aaa") == [0, 4] # multiple repeated groups
assert solution.findAnagrams("abc", "cba") == [0] # full string match
| Test | Why |
|---|---|
"cbaebabacd", "abc" |
Standard example with separated matches |
"abab", "ab" |
Validates overlapping windows |
"aaaaa", "aa" |
Tests repeated character handling |
"abcdef", "gh" |
Ensures no false positives |
"a", "a" |
Smallest valid matching input |
"a", "b" |
Smallest non-matching input |
"ab", "abc" |
Pattern longer than string |
"baa", "aa" |
Match occurs at the end |
"aaabaaa", "aaa" |
Multiple repeated-character windows |
"abc", "cba" |
Entire string is an anagram |
Edge Cases
Pattern Longer Than the String
If p is longer than s, no substring of s can possibly match the required length. This is a common source of index errors in sliding window implementations.
The solution handles this immediately with:
if len(p) > len(s):
return []
This prevents invalid window initialization and unnecessary work.
Repeated Characters
Cases like:
s = "aaaaa"
p = "aa"
can expose bugs in implementations that only track unique characters rather than exact frequencies.
The algorithm correctly tracks counts for every character, so overlapping windows like "aa" are all detected properly.
Overlapping Anagrams
Inputs such as:
s = "abab"
p = "ab"
contain overlapping valid windows.
A naive implementation might accidentally skip windows after finding a match. The sliding window moves exactly one position at a time, ensuring every possible substring is checked.
Match at the Beginning or End
Some implementations incorrectly handle boundary updates and miss windows at the edges of the string.
Examples include:
"abc", "abc"
"baa", "aa"
The algorithm explicitly checks:
- The initial window before sliding
- Every subsequent window after each update
This guarantees matches at both boundaries are detected correctly.