LeetCode 3298 - Count Substrings That Can Be Rearranged to Contain a String II

Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window

Solution

Problem Understanding

We are given two strings, word1 and word2. We need to count how many substrings of word1 are considered valid.

A substring is valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string.

This condition is equivalent to saying that the substring must contain at least all the characters required by word2, with the correct frequencies.

For example, if:

word2 = "aabc"

then a valid substring must contain:

at least two 'a'
at least one 'b'
at least one 'c'

The remaining characters do not matter because we can rearrange the substring arbitrarily.

So the real task is:

Count the number of substrings of word1 whose character frequencies cover the frequency requirements of word2.

The constraints are extremely important:

word1.length can be up to 10^6
word2.length can be up to 10^4

A quadratic or even near quadratic solution is impossible. With one million characters, we need an algorithm that runs in linear time, or very close to it.

The problem also explicitly warns that memory limits are tight. That means we should avoid storing large auxiliary structures such as frequency maps for every substring or prefix tables of size O(n * alphabet).

Because the strings only contain lowercase English letters, we can use fixed-size arrays of length 26 instead of hash maps. This gives both better performance and lower memory usage.

Several edge cases are important:

word2 may contain repeated characters, such as "aaabc"
word1 may not contain enough occurrences of a required character
Every substring could be valid
No substring could be valid
The valid substring might need to extend all the way to the end of word1

These cases can easily break incorrect sliding window implementations.

Approaches

Brute Force

The most direct approach is to examine every substring of word1.

For each starting index:

Extend the substring one character at a time.
Maintain character frequencies.
Check whether the substring contains all required frequencies from word2.

If the frequency requirements are satisfied, count the substring as valid.

This works because every substring is checked explicitly, so no valid substring can be missed.

However, the complexity is far too large.

There are O(n^2) substrings, and even with incremental frequency updates, checking validity costs up to O(26) each time.

With n = 10^6, this is completely infeasible.

Optimal Sliding Window

The key observation is that validity depends only on character frequencies.

We do not care about the order of characters because rearrangement is allowed.

This naturally suggests a sliding window approach.

We maintain a window [left, right] and track the character counts inside it.

The important insight is:

Once a window becomes valid, every larger window extending further right is also valid.

For example, suppose substring [left, right] already contains all required characters. Adding more characters cannot make it invalid.

This monotonic property allows us to count many substrings at once.

We expand the right pointer until the window becomes valid. Then:

all substrings starting at left and ending at any index >= right are valid
there are n - right such substrings

After counting them, we shrink the left side and continue.

This produces a linear time solution because each pointer moves at most n times.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	`O(n^2)`	`O(26)`	Checks every substring explicitly
Optimal Sliding Window	`O(n)`	`O(26)`	Uses two pointers and frequency tracking

Algorithm Walkthrough

Step 1: Build the Required Frequency Array

We first count how many times each character appears in word2.

For example:

word2 = "aabc"

produces:

a -> 2
b -> 1
c -> 1

We store this in a fixed array of size 26.

Step 2: Initialize the Sliding Window

We maintain:

left pointer
right pointer
current window frequencies
a variable called missing

missing represents how many required characters are still missing from the current window.

Initially:

missing = len(word2)

As we add characters into the window, we reduce missing whenever we satisfy one needed occurrence.

Step 3: Expand the Right Pointer

We iterate through word1 using right.

For each character:

Add it to the window frequency array.
If this character was still needed, decrease missing.

A character is still needed if:

window_count[char] <= required_count[char]

after insertion.

Step 4: Detect a Valid Window

When:

missing == 0

the current window contains all required characters.

At this moment, the window is valid.

Step 5: Count All Valid Extensions

Suppose the current valid window is:

[left, right]

Then every substring:

[left, right]
[left, right + 1]
[left, right + 2]
...
[left, n - 1]

is also valid.

So we add:

n - right

to the answer.

This is the crucial optimization that avoids checking every substring individually.

Step 6: Shrink the Window

After counting, we try moving left forward.

We remove the leftmost character from the window.

If removing it causes the window to lose a required occurrence, we increase missing.

Then we continue shrinking until the window becomes invalid again.

Step 7: Continue Until the End

We repeat this process for every right position.

Because each character enters and leaves the window at most once, the algorithm runs in linear time.

Why it works

The sliding window always maintains accurate frequency counts for the current substring.

The variable missing guarantees that:

missing == 0

if and only if the window satisfies all frequency requirements from word2.

Once a window becomes valid, any larger window extending further right must also remain valid because adding characters cannot remove required frequencies.

Therefore, counting n - right valid substrings at once is correct.

Since every valid substring is counted exactly once, the algorithm produces the correct answer.

Python Solution

class Solution:
    def validSubstringCount(self, word1: str, word2: str) -> int:
        required = [0] * 26
        
        for ch in word2:
            required[ord(ch) - ord('a')] += 1
        
        window = [0] * 26
        
        missing = len(word2)
        left = 0
        answer = 0
        n = len(word1)
        
        for right in range(n):
            idx = ord(word1[right]) - ord('a')
            
            window[idx] += 1
            
            if window[idx] <= required[idx]:
                missing -= 1
            
            while missing == 0:
                answer += n - right
                
                left_idx = ord(word1[left]) - ord('a')
                
                window[left_idx] -= 1
                
                if window[left_idx] < required[left_idx]:
                    missing += 1
                
                left += 1
        
        return answer

The implementation begins by constructing the frequency requirements for word2.

Instead of using a hash map, the solution uses a fixed-size array of length 26 because the input contains only lowercase English letters. This improves both memory usage and runtime performance.

The window array stores character frequencies inside the current sliding window.

The variable missing tracks how many required character occurrences have not yet been satisfied.

As the right pointer expands:

window[idx] += 1

the algorithm checks whether the newly added character contributes toward satisfying the requirement:

if window[idx] <= required[idx]:
    missing -= 1

When missing becomes zero, the window is valid.

At that point:

answer += n - right

counts every valid extension of the current window.

The inner while loop then shrinks the left boundary to search for smaller valid windows.

If removing a character causes the window to lose a required occurrence:

if window[left_idx] < required[left_idx]:
    missing += 1

the window becomes invalid again and expansion resumes.

The result is a strict linear time algorithm.

Go Solution

func validSubstringCount(word1 string, word2 string) int64 {
    required := make([]int, 26)

    for _, ch := range word2 {
        required[ch-'a']++
    }

    window := make([]int, 26)

    missing := len(word2)
    left := 0
    n := len(word1)

    var answer int64 = 0

    for right := 0; right < n; right++ {
        idx := word1[right] - 'a'

        window[idx]++

        if window[idx] <= required[idx] {
            missing--
        }

        for missing == 0 {
            answer += int64(n - right)

            leftIdx := word1[left] - 'a'

            window[leftIdx]--

            if window[leftIdx] < required[leftIdx] {
                missing++
            }

            left++
        }
    }

    return answer
}

The Go implementation follows the same logic as the Python solution.

One important difference is that the return type is int64.

The number of substrings can be as large as:

n * (n + 1) / 2

which exceeds the range of a 32-bit integer when n is large.

Go arrays are implemented using slices created with make([]int, 26).

Character indexing uses byte arithmetic because the strings contain only lowercase English letters.

Worked Examples

Example 1

word1 = "bcca"
word2 = "abc"

Required frequencies:

Character	Count
a	1
b	1
c	1

Initial state:

Variable	Value
left	0
missing	3
answer	0

Iteration Trace

right	char	window	missing	valid?	answer
0	b	b=1	2	No	0
1	c	b=1,c=1	1	No	0
2	c	b=1,c=2	1	No	0
3	a	a=1,b=1,c=2	0	Yes	1

When right = 3, the window becomes valid.

We add:

n - right = 4 - 3 = 1

So:

answer = 1

Final result:

Example 2

word1 = "abcabc"
word2 = "abc"

Required frequencies:

Character	Count
a	1
b	1
c	1

Iteration Trace

right	char	missing	valid windows added	total
0	a	2	0	0
1	b	1	0	0
2	c	0	4	4
3	a	0	3	7
4	b	0	2	9
5	c	0	1	10

Final answer:

Example 3

word1 = "abcabc"
word2 = "aaabc"

Required frequencies:

Character	Count
a	3
b	1
c	1

word1 contains only two 'a' characters total.

Therefore, no substring can satisfy the requirement.

The window never becomes valid.

Final answer:

Complexity Analysis

Measure	Complexity	Explanation
Time	`O(n)`	Each pointer moves at most `n` times
Space	`O(1)`	Frequency arrays always have size 26

The sliding window guarantees linear runtime because every character is processed at most twice, once when entering the window and once when leaving it.

The memory usage is constant because the algorithm stores only two fixed-size arrays of length 26, regardless of input size.

Test Cases

sol = Solution()

assert sol.validSubstringCount("bcca", "abc") == 1  # provided example 1

assert sol.validSubstringCount("abcabc", "abc") == 10  # provided example 2

assert sol.validSubstringCount("abcabc", "aaabc") == 0  # provided example 3

assert sol.validSubstringCount("a", "a") == 1  # single character valid

assert sol.validSubstringCount("a", "b") == 0  # single character invalid

assert sol.validSubstringCount("aaaaa", "aa") == 10  # repeated characters

assert sol.validSubstringCount("abab", "ab") == 6  # overlapping valid windows

assert sol.validSubstringCount("xyz", "abc") == 0  # no matching characters

assert sol.validSubstringCount("abc", "abc") == 1  # entire string only

assert sol.validSubstringCount("aaabbbccc", "abc") == 22  # many valid substrings

assert sol.validSubstringCount("aaaaaaaa", "aaaa") == 15  # all long enough substrings valid

assert sol.validSubstringCount("abcd", "dcba") == 1  # exact frequency match

assert sol.validSubstringCount("abcde", "aabb") == 0  # insufficient duplicates

assert sol.validSubstringCount("zzzzabc", "abc") == 5  # valid suffix expansions

assert sol.validSubstringCount("abcabcabc", "aabbcc") == 10  # larger frequency requirements

Test Summary

Test	Why
`"bcca", "abc"`	Basic valid case
`"abcabc", "abc"`	Many overlapping valid substrings
`"abcabc", "aaabc"`	Impossible frequency requirement
`"a", "a"`	Smallest valid input
`"a", "b"`	Smallest invalid input
`"aaaaa", "aa"`	Repeated character handling
`"abab", "ab"`	Multiple overlapping windows
`"xyz", "abc"`	No matching characters
`"abc", "abc"`	Entire string only
`"aaabbbccc", "abc"`	Dense valid substring region
`"aaaaaaaa", "aaaa"`	Every sufficiently long substring valid
`"abcd", "dcba"`	Rearrangement property
`"abcde", "aabb"`	Missing duplicate requirement
`"zzzzabc", "abc"`	Valid windows concentrated near end
`"abcabcabc", "aabbcc"`	Larger frequency constraints

Edge Cases

One important edge case occurs when word2 requires repeated characters.

For example:

word2 = "aaabc"

A buggy implementation might only check whether each distinct character exists in the window, which would incorrectly treat "abc" as valid. The correct solution tracks exact frequencies using arrays, ensuring the window contains the required number of occurrences for every character.

Another important edge case occurs when no valid substring exists at all.

For example:

word1 = "abc"
word2 = "dddd"

The sliding window never reaches a state where missing == 0. The algorithm handles this naturally because the counting logic only executes for valid windows.

A third critical edge case involves extremely large inputs.

Since word1.length can reach one million, any quadratic solution will time out or exceed memory limits. The implementation avoids this by ensuring both pointers move only forward. Every character enters and leaves the window at most once, producing true linear complexity.

Another subtle edge case occurs when every large enough substring is valid.

For example:

word1 = "aaaaaa"
word2 = "aa"

Once the window size reaches two, every extension remains valid. The optimization:

answer += n - right

is what makes the solution efficient in this scenario. Without it, the algorithm would still waste time enumerating every valid substring individually.

Finally, the implementation correctly handles windows that become invalid after shrinking.

When removing the leftmost character reduces a required frequency below its target, the algorithm immediately increments missing. This guarantees that the window validity state always remains accurate.