LeetCode 438: Find All Anagrams in a String

Find all starting indices where an anagram of p appears in s using a fixed-size sliding window.

Problem Restatement

We are given two strings:

Name Meaning
s The string we search inside
p The pattern string

We need to return all starting indices in s where an anagram of p begins.

An anagram uses the same characters with the same frequencies, but the order can be different.

For example:

"abc", "bac", "cba"

are all anagrams of each other.

The official problem asks us to return all start indices of p's anagrams in s, in any order. Both strings contain lowercase English letters, and their lengths are at most 3 * 10^4.

Input and Output

Item Meaning
Input Strings s and p
Output List of starting indices
Anagram rule Same character frequencies as p
Window length Must be exactly len(p)
Character set Lowercase English letters

Example function shape:

def findAnagrams(s: str, p: str) -> list[int]:
    ...

Examples

Example 1:

s = "cbaebabacd"
p = "abc"

Substrings of length 3 that are anagrams of "abc":

index 0: "cba"
index 6: "bac"

Answer:

[0, 6]

Example 2:

s = "abab"
p = "ab"

Substrings of length 2:

index 0: "ab"
index 1: "ba"
index 2: "ab"

All three are anagrams of "ab".

Answer:

[0, 1, 2]

Example 3:

s = "aaaaa"
p = "aa"

Every length-2 substring is "aa".

Answer:

[0, 1, 2, 3]

First Thought: Sort Every Substring

A direct approach is:

  1. Sort p.
  2. For every substring of s with length len(p), sort that substring.
  3. If the sorted strings match, record the starting index.

For example:

p = "abc"
sorted_p = "abc"

At index 0:

substring = "cba"
sorted_substring = "abc"

So index 0 is valid.

This works, but sorting every window is expensive.

If m = len(p) and n = len(s), this costs roughly:

O(n * m log m)

We can do better by reusing work between adjacent windows.

Key Insight

Two neighboring windows of length len(p) are almost the same.

For example:

s = "cbaebabacd"
p = "abc"

window 0: c b a
window 1:   b a e

When the window moves one step:

Change Character
Leaves window c
Enters window e

So instead of recounting the whole window, we update only two character counts.

Because s and p contain lowercase English letters, we can use arrays of length 26.

One array stores frequencies in p.

Another array stores frequencies in the current window of s.

If the arrays are equal, the current window is an anagram.

Algorithm

Let:

n = len(s)
m = len(p)

If m > n, no substring of s can have length m, so return [].

Create two arrays:

p_count = [0] * 26
window_count = [0] * 26

Fill both arrays for the first m characters:

  1. Count all characters in p.
  2. Count the first window s[0:m].

If the counts match, append 0.

Then slide the window from left to right.

For each new right index right from m to n - 1:

  1. Add s[right] to the window.
  2. Remove s[right - m] from the window.
  3. If counts match, append the new start index:
right - m + 1

Return the answer.

Correctness

A substring is an anagram of p exactly when it has the same character frequency for every lowercase English letter.

The algorithm maintains window_count for exactly the current substring of s with length len(p).

Initially, window_count represents s[0:len(p)].

When the window slides one position to the right, the algorithm removes the character that leaves the window and adds the character that enters the window. Therefore, after each slide, window_count again represents exactly the current length-len(p) substring.

At each position, the algorithm compares window_count with p_count.

If they are equal, the current substring has the same character frequencies as p, so its start index is valid.

If they are different, at least one character frequency differs, so the current substring cannot be an anagram.

Thus, the algorithm adds exactly all valid starting indices and no invalid ones.

Complexity

Metric Value Why
Time O(n) Each character enters and leaves the window at most once
Space O(1) Two arrays of size 26

The array comparison costs O(26), which is constant.

Implementation

class Solution:
    def findAnagrams(self, s: str, p: str) -> list[int]:
        n = len(s)
        m = len(p)

        if m > n:
            return []

        p_count = [0] * 26
        window_count = [0] * 26

        for i in range(m):
            p_index = ord(p[i]) - ord('a')
            s_index = ord(s[i]) - ord('a')

            p_count[p_index] += 1
            window_count[s_index] += 1

        answer = []

        if window_count == p_count:
            answer.append(0)

        for right in range(m, n):
            enter_index = ord(s[right]) - ord('a')
            leave_index = ord(s[right - m]) - ord('a')

            window_count[enter_index] += 1
            window_count[leave_index] -= 1

            if window_count == p_count:
                answer.append(right - m + 1)

        return answer

Code Explanation

We first compute lengths:

n = len(s)
m = len(p)

If p is longer than s, no valid window exists:

if m > n:
    return []

We use frequency arrays for lowercase English letters:

p_count = [0] * 26
window_count = [0] * 26

The character index is computed by:

ord(ch) - ord('a')

We fill the count array for p and for the first window in s:

for i in range(m):

Then we check the first window:

if window_count == p_count:
    answer.append(0)

After that, we slide the window.

The character at right enters:

window_count[enter_index] += 1

The character at right - m leaves:

window_count[leave_index] -= 1

After the update, the current window starts at:

right - m + 1

If the count arrays match, that start index is added to the answer.

Testing

def run_tests():
    s = Solution()

    assert s.findAnagrams("cbaebabacd", "abc") == [0, 6]

    assert s.findAnagrams("abab", "ab") == [0, 1, 2]

    assert s.findAnagrams("aaaaa", "aa") == [0, 1, 2, 3]

    assert s.findAnagrams("abc", "abcd") == []

    assert s.findAnagrams("baa", "aa") == [1]

    assert s.findAnagrams("af", "be") == []

    print("all tests passed")

run_tests()

Test meaning:

Test Why
"cbaebabacd", "abc" Standard example
"abab", "ab" Overlapping anagrams
Repeated same character Checks frequency counts
Pattern longer than string Must return empty list
Match at the end Checks final window
No shared counts Checks no-result case