LeetCode 3298 - Count Substrings That Can Be Rearranged to Contain a String II
We are given two strings, word1 and word2. We need to count how many substrings of word1 are considered valid. A substring is valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string.
Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
We are given two strings, word1 and word2. We need to count how many substrings of word1 are considered valid.
A substring is valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string.
This condition is equivalent to saying that the substring must contain at least all the characters required by word2, with the correct frequencies.
For example, if:
word2 = "aabc"
then a valid substring must contain:
- at least two
'a' - at least one
'b' - at least one
'c'
The remaining characters do not matter because we can rearrange the substring arbitrarily.
So the real task is:
Count the number of substrings of
word1whose character frequencies cover the frequency requirements ofword2.
The constraints are extremely important:
word1.lengthcan be up to10^6word2.lengthcan be up to10^4
A quadratic or even near quadratic solution is impossible. With one million characters, we need an algorithm that runs in linear time, or very close to it.
The problem also explicitly warns that memory limits are tight. That means we should avoid storing large auxiliary structures such as frequency maps for every substring or prefix tables of size O(n * alphabet).
Because the strings only contain lowercase English letters, we can use fixed-size arrays of length 26 instead of hash maps. This gives both better performance and lower memory usage.
Several edge cases are important:
word2may contain repeated characters, such as"aaabc"word1may not contain enough occurrences of a required character- Every substring could be valid
- No substring could be valid
- The valid substring might need to extend all the way to the end of
word1
These cases can easily break incorrect sliding window implementations.
Approaches
Brute Force
The most direct approach is to examine every substring of word1.
For each starting index:
- Extend the substring one character at a time.
- Maintain character frequencies.
- Check whether the substring contains all required frequencies from
word2.
If the frequency requirements are satisfied, count the substring as valid.
This works because every substring is checked explicitly, so no valid substring can be missed.
However, the complexity is far too large.
There are O(n^2) substrings, and even with incremental frequency updates, checking validity costs up to O(26) each time.
With n = 10^6, this is completely infeasible.
Optimal Sliding Window
The key observation is that validity depends only on character frequencies.
We do not care about the order of characters because rearrangement is allowed.
This naturally suggests a sliding window approach.
We maintain a window [left, right] and track the character counts inside it.
The important insight is:
Once a window becomes valid, every larger window extending further right is also valid.
For example, suppose substring [left, right] already contains all required characters. Adding more characters cannot make it invalid.
This monotonic property allows us to count many substrings at once.
We expand the right pointer until the window becomes valid. Then:
- all substrings starting at
leftand ending at any index>= rightare valid - there are
n - rightsuch substrings
After counting them, we shrink the left side and continue.
This produces a linear time solution because each pointer moves at most n times.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) |
O(26) |
Checks every substring explicitly |
| Optimal Sliding Window | O(n) |
O(26) |
Uses two pointers and frequency tracking |
Algorithm Walkthrough
Step 1: Build the Required Frequency Array
We first count how many times each character appears in word2.
For example:
word2 = "aabc"
produces:
a -> 2
b -> 1
c -> 1
We store this in a fixed array of size 26.
Step 2: Initialize the Sliding Window
We maintain:
leftpointerrightpointer- current window frequencies
- a variable called
missing
missing represents how many required characters are still missing from the current window.
Initially:
missing = len(word2)
As we add characters into the window, we reduce missing whenever we satisfy one needed occurrence.
Step 3: Expand the Right Pointer
We iterate through word1 using right.
For each character:
- Add it to the window frequency array.
- If this character was still needed, decrease
missing.
A character is still needed if:
window_count[char] <= required_count[char]
after insertion.
Step 4: Detect a Valid Window
When:
missing == 0
the current window contains all required characters.
At this moment, the window is valid.
Step 5: Count All Valid Extensions
Suppose the current valid window is:
[left, right]
Then every substring:
[left, right]
[left, right + 1]
[left, right + 2]
...
[left, n - 1]
is also valid.
So we add:
n - right
to the answer.
This is the crucial optimization that avoids checking every substring individually.
Step 6: Shrink the Window
After counting, we try moving left forward.
We remove the leftmost character from the window.
If removing it causes the window to lose a required occurrence, we increase missing.
Then we continue shrinking until the window becomes invalid again.
Step 7: Continue Until the End
We repeat this process for every right position.
Because each character enters and leaves the window at most once, the algorithm runs in linear time.
Why it works
The sliding window always maintains accurate frequency counts for the current substring.
The variable missing guarantees that:
missing == 0
if and only if the window satisfies all frequency requirements from word2.
Once a window becomes valid, any larger window extending further right must also remain valid because adding characters cannot remove required frequencies.
Therefore, counting n - right valid substrings at once is correct.
Since every valid substring is counted exactly once, the algorithm produces the correct answer.
Python Solution
class Solution:
def validSubstringCount(self, word1: str, word2: str) -> int:
required = [0] * 26
for ch in word2:
required[ord(ch) - ord('a')] += 1
window = [0] * 26
missing = len(word2)
left = 0
answer = 0
n = len(word1)
for right in range(n):
idx = ord(word1[right]) - ord('a')
window[idx] += 1
if window[idx] <= required[idx]:
missing -= 1
while missing == 0:
answer += n - right
left_idx = ord(word1[left]) - ord('a')
window[left_idx] -= 1
if window[left_idx] < required[left_idx]:
missing += 1
left += 1
return answer
The implementation begins by constructing the frequency requirements for word2.
Instead of using a hash map, the solution uses a fixed-size array of length 26 because the input contains only lowercase English letters. This improves both memory usage and runtime performance.
The window array stores character frequencies inside the current sliding window.
The variable missing tracks how many required character occurrences have not yet been satisfied.
As the right pointer expands:
window[idx] += 1
the algorithm checks whether the newly added character contributes toward satisfying the requirement:
if window[idx] <= required[idx]:
missing -= 1
When missing becomes zero, the window is valid.
At that point:
answer += n - right
counts every valid extension of the current window.
The inner while loop then shrinks the left boundary to search for smaller valid windows.
If removing a character causes the window to lose a required occurrence:
if window[left_idx] < required[left_idx]:
missing += 1
the window becomes invalid again and expansion resumes.
The result is a strict linear time algorithm.
Go Solution
func validSubstringCount(word1 string, word2 string) int64 {
required := make([]int, 26)
for _, ch := range word2 {
required[ch-'a']++
}
window := make([]int, 26)
missing := len(word2)
left := 0
n := len(word1)
var answer int64 = 0
for right := 0; right < n; right++ {
idx := word1[right] - 'a'
window[idx]++
if window[idx] <= required[idx] {
missing--
}
for missing == 0 {
answer += int64(n - right)
leftIdx := word1[left] - 'a'
window[leftIdx]--
if window[leftIdx] < required[leftIdx] {
missing++
}
left++
}
}
return answer
}
The Go implementation follows the same logic as the Python solution.
One important difference is that the return type is int64.
The number of substrings can be as large as:
n * (n + 1) / 2
which exceeds the range of a 32-bit integer when n is large.
Go arrays are implemented using slices created with make([]int, 26).
Character indexing uses byte arithmetic because the strings contain only lowercase English letters.
Worked Examples
Example 1
word1 = "bcca"
word2 = "abc"
Required frequencies:
| Character | Count |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
Initial state:
| Variable | Value |
|---|---|
| left | 0 |
| missing | 3 |
| answer | 0 |
Iteration Trace
| right | char | window | missing | valid? | answer |
|---|---|---|---|---|---|
| 0 | b | b=1 | 2 | No | 0 |
| 1 | c | b=1,c=1 | 1 | No | 0 |
| 2 | c | b=1,c=2 | 1 | No | 0 |
| 3 | a | a=1,b=1,c=2 | 0 | Yes | 1 |
When right = 3, the window becomes valid.
We add:
n - right = 4 - 3 = 1
So:
answer = 1
Final result:
1
Example 2
word1 = "abcabc"
word2 = "abc"
Required frequencies:
| Character | Count |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
Iteration Trace
| right | char | missing | valid windows added | total |
|---|---|---|---|---|
| 0 | a | 2 | 0 | 0 |
| 1 | b | 1 | 0 | 0 |
| 2 | c | 0 | 4 | 4 |
| 3 | a | 0 | 3 | 7 |
| 4 | b | 0 | 2 | 9 |
| 5 | c | 0 | 1 | 10 |
Final answer:
10
Example 3
word1 = "abcabc"
word2 = "aaabc"
Required frequencies:
| Character | Count |
|---|---|
| a | 3 |
| b | 1 |
| c | 1 |
word1 contains only two 'a' characters total.
Therefore, no substring can satisfy the requirement.
The window never becomes valid.
Final answer:
0
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) |
Each pointer moves at most n times |
| Space | O(1) |
Frequency arrays always have size 26 |
The sliding window guarantees linear runtime because every character is processed at most twice, once when entering the window and once when leaving it.
The memory usage is constant because the algorithm stores only two fixed-size arrays of length 26, regardless of input size.
Test Cases
sol = Solution()
assert sol.validSubstringCount("bcca", "abc") == 1 # provided example 1
assert sol.validSubstringCount("abcabc", "abc") == 10 # provided example 2
assert sol.validSubstringCount("abcabc", "aaabc") == 0 # provided example 3
assert sol.validSubstringCount("a", "a") == 1 # single character valid
assert sol.validSubstringCount("a", "b") == 0 # single character invalid
assert sol.validSubstringCount("aaaaa", "aa") == 10 # repeated characters
assert sol.validSubstringCount("abab", "ab") == 6 # overlapping valid windows
assert sol.validSubstringCount("xyz", "abc") == 0 # no matching characters
assert sol.validSubstringCount("abc", "abc") == 1 # entire string only
assert sol.validSubstringCount("aaabbbccc", "abc") == 22 # many valid substrings
assert sol.validSubstringCount("aaaaaaaa", "aaaa") == 15 # all long enough substrings valid
assert sol.validSubstringCount("abcd", "dcba") == 1 # exact frequency match
assert sol.validSubstringCount("abcde", "aabb") == 0 # insufficient duplicates
assert sol.validSubstringCount("zzzzabc", "abc") == 5 # valid suffix expansions
assert sol.validSubstringCount("abcabcabc", "aabbcc") == 10 # larger frequency requirements
Test Summary
| Test | Why |
|---|---|
"bcca", "abc" |
Basic valid case |
"abcabc", "abc" |
Many overlapping valid substrings |
"abcabc", "aaabc" |
Impossible frequency requirement |
"a", "a" |
Smallest valid input |
"a", "b" |
Smallest invalid input |
"aaaaa", "aa" |
Repeated character handling |
"abab", "ab" |
Multiple overlapping windows |
"xyz", "abc" |
No matching characters |
"abc", "abc" |
Entire string only |
"aaabbbccc", "abc" |
Dense valid substring region |
"aaaaaaaa", "aaaa" |
Every sufficiently long substring valid |
"abcd", "dcba" |
Rearrangement property |
"abcde", "aabb" |
Missing duplicate requirement |
"zzzzabc", "abc" |
Valid windows concentrated near end |
"abcabcabc", "aabbcc" |
Larger frequency constraints |
Edge Cases
One important edge case occurs when word2 requires repeated characters.
For example:
word2 = "aaabc"
A buggy implementation might only check whether each distinct character exists in the window, which would incorrectly treat "abc" as valid. The correct solution tracks exact frequencies using arrays, ensuring the window contains the required number of occurrences for every character.
Another important edge case occurs when no valid substring exists at all.
For example:
word1 = "abc"
word2 = "dddd"
The sliding window never reaches a state where missing == 0. The algorithm handles this naturally because the counting logic only executes for valid windows.
A third critical edge case involves extremely large inputs.
Since word1.length can reach one million, any quadratic solution will time out or exceed memory limits. The implementation avoids this by ensuring both pointers move only forward. Every character enters and leaves the window at most once, producing true linear complexity.
Another subtle edge case occurs when every large enough substring is valid.
For example:
word1 = "aaaaaa"
word2 = "aa"
Once the window size reaches two, every extension remains valid. The optimization:
answer += n - right
is what makes the solution efficient in this scenario. Without it, the algorithm would still waste time enumerating every valid substring individually.
Finally, the implementation correctly handles windows that become invalid after shrinking.
When removing the leftmost character reduces a required frequency below its target, the algorithm immediately increments missing. This guarantees that the window validity state always remains accurate.