LeetCode 3297 - Count Substrings That Can Be Rearranged to Contain a String I
We are given two lowercase strings, word1 and word2. A substring of word1 is considered valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string. To understand what this means, suppose word2 = "abc".
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window
Solution
LeetCode 3297 - Count Substrings That Can Be Rearranged to Contain a String I
Problem Understanding
We are given two lowercase strings, word1 and word2.
A substring of word1 is considered valid if its characters can be rearranged so that word2 becomes a prefix of the rearranged string.
To understand what this means, suppose word2 = "abc". If a substring contains at least one 'a', one 'b', and one 'c', then we can rearrange the substring so that those three characters appear first, forming the prefix "abc". Any extra characters can appear afterward.
Therefore, the ordering inside the substring does not matter. Only the character frequencies matter.
More generally, if word2 contains repeated characters, then the substring must contain at least the same frequency of every character. For example:
-
If
word2 = "abc", a valid substring must contain at least: -
1
'a' -
1
'b' -
1
'c' -
If
word2 = "aaabc", a valid substring must contain at least: -
3
'a' -
1
'b' -
1
'c'
The task is to count how many substrings of word1 satisfy these frequency requirements.
The constraints are important:
|word1| ≤ 100000|word2| ≤ 10000
A quadratic solution that examines every substring individually would be far too slow because word1 can contain one hundred thousand characters. We need a solution that runs approximately in linear time.
Important edge cases include:
word2contains repeated characters.word1is shorter thanword2.- Required characters never appear enough times.
- Every sufficiently long substring is valid.
- A valid window may contain many extra characters that are irrelevant to the requirements.
Approaches
Brute Force
The most direct approach is to enumerate every substring of word1.
For each starting position, we expand the ending position and maintain character frequencies. For every substring, we check whether its frequency counts satisfy all requirements from word2.
This approach is correct because every substring is examined exactly once and tested against the validity condition.
However, there are O(n²) substrings. Even if frequency checking is optimized to O(26), the overall complexity becomes O(n²), which is far too slow for n = 100000.
Key Observation
A substring is valid if and only if its frequency count contains at least the required frequency of every character in word2.
This transforms the problem into a frequency coverage problem.
Suppose we use a sliding window [left, right].
When the current window satisfies all frequency requirements, every larger window that starts at the same left and ends at or after right is also valid. Adding more characters cannot invalidate the frequency requirements.
Therefore, once a valid window is found:
- Current window
[left, right]is valid. [left, right+1],[left, right+2], ...,[left, n-1]are also valid.
If n is the length of word1, then this contributes:
n - right
valid substrings.
This monotonic property allows us to use a sliding window and count many substrings at once.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(26) | Checks all substrings individually |
| Optimal Sliding Window | O(n) | O(26) | Counts groups of valid substrings using frequency coverage |
Algorithm Walkthrough
- Create a frequency array
requiredstoring how many times each character appears inword2. - Compute how many distinct characters actually have a positive requirement. Call this value
needed. - Maintain another frequency array
windowfor the current sliding window. - Maintain a variable
formedrepresenting how many required characters currently meet their required frequency exactly or beyond. - Initialize:
left = 0answer = 0
- Expand the window by moving
rightfrom left to right acrossword1. - When a new character enters the window:
- Increment its frequency in
window. - If its frequency becomes equal to the required frequency, increment
formed.
- Whenever
formed == needed, the current window satisfies all requirements. - At this point:
- Every substring ending at positions
right, right+1, ..., n-1and starting atleftis valid. - Add
n - rightto the answer.
- Try to shrink the window from the left:
- Remove
word1[left]. - If removing it causes a previously satisfied requirement to become unsatisfied, decrement
formed. - Increment
left.
- Continue shrinking while the window remains valid. This ensures each valid starting position contributes exactly once.
- After processing all positions, return the accumulated answer.
Why it works
The sliding window maintains the exact frequency counts of the current substring. The variable formed tracks whether every required character frequency has been satisfied.
Whenever the window becomes valid, any extension of that window remains valid because adding characters cannot reduce frequencies. Therefore all suffix extensions ending after right are guaranteed to be valid. Counting n - right substrings at once is correct.
The shrinking phase ensures that every starting position is processed exactly once, preventing double counting. Since both pointers only move forward, the algorithm runs in linear time.
Python Solution
class Solution:
def validSubstringCount(self, word1: str, word2: str) -> int:
required = [0] * 26
for ch in word2:
required[ord(ch) - ord('a')] += 1
needed = sum(1 for count in required if count > 0)
window = [0] * 26
formed = 0
left = 0
n = len(word1)
answer = 0
for right, ch in enumerate(word1):
idx = ord(ch) - ord('a')
window[idx] += 1
if required[idx] > 0 and window[idx] == required[idx]:
formed += 1
while formed == needed:
answer += n - right
left_idx = ord(word1[left]) - ord('a')
if required[left_idx] > 0 and window[left_idx] == required[left_idx]:
formed -= 1
window[left_idx] -= 1
left += 1
return answer
The implementation begins by constructing the required frequency table from word2.
The variable needed stores the number of distinct characters that must be satisfied. For example, if word2 = "aaabc", then the required characters are 'a', 'b', and 'c', so needed = 3.
As the right pointer expands the window, frequencies are updated. Whenever a character reaches its required frequency, formed increases.
Once all requirements are satisfied (formed == needed), the current window and every extension of it are valid. Therefore n - right is added to the answer.
The left pointer then shrinks the window as much as possible while maintaining correctness. If removing a character breaks a requirement, formed is decreased and the shrinking stops.
Because each character enters and leaves the window at most once, the solution runs in linear time.
Go Solution
func validSubstringCount(word1 string, word2 string) int64 {
required := [26]int{}
for _, ch := range word2 {
required[ch-'a']++
}
needed := 0
for _, count := range required {
if count > 0 {
needed++
}
}
window := [26]int{}
formed := 0
left := 0
n := len(word1)
var answer int64 = 0
for right := 0; right < n; right++ {
idx := word1[right] - 'a'
window[idx]++
if required[idx] > 0 && window[idx] == required[idx] {
formed++
}
for formed == needed {
answer += int64(n - right)
leftIdx := word1[left] - 'a'
if required[leftIdx] > 0 && window[leftIdx] == required[leftIdx] {
formed--
}
window[leftIdx]--
left++
}
}
return answer
}
The Go implementation follows exactly the same logic as the Python version.
One important difference is the return type. The number of valid substrings can be as large as roughly n * (n + 1) / 2, which exceeds the range of a 32 bit integer when n = 100000. Therefore the answer is stored in an int64.
The fixed-size frequency arrays use Go arrays of length 26, which avoids hash map overhead and provides constant-time access.
Worked Examples
Example 1
word1 = "bcca"
word2 = "abc"
Required frequencies:
| Character | Required |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
n = 4
| right | char | Window | formed |
|---|---|---|---|
| 0 | b | b | 1 |
| 1 | c | bc | 2 |
| 2 | c | bcc | 2 |
| 3 | a | bcca | 3 |
Window becomes valid at right = 3.
Add:
n - right = 4 - 3 = 1
Answer becomes 1.
After removing 'b', the requirement for 'b' is no longer satisfied.
Final answer:
1
Example 2
word1 = "abcabc"
word2 = "abc"
n = 6
| right | Window | Valid? | Contribution |
|---|---|---|---|
| 0 | a | No | 0 |
| 1 | ab | No | 0 |
| 2 | abc | Yes | 4 |
| 3 | abca | Yes | 3 |
| 4 | abcab | Yes | 2 |
| 5 | abcabc | Yes | 1 |
The shrinking process counts all valid starting positions.
Total:
4 + 3 + 2 + 1 = 10
Answer:
10
Example 3
word1 = "abcabc"
word2 = "aaabc"
Required:
| Character | Required |
|---|---|
| a | 3 |
| b | 1 |
| c | 1 |
The entire string contains only two 'a' characters.
No window can ever satisfy the requirement of three 'a' characters.
Answer:
0
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each pointer moves forward at most n times |
| Space | O(1) | Two fixed-size arrays of length 26 |
The left and right pointers each traverse word1 at most once. Every character enters the window once and leaves the window once. Since the alphabet contains only 26 lowercase letters, all frequency operations are constant time and the extra memory usage is constant.
Test Cases
sol = Solution()
assert sol.validSubstringCount("bcca", "abc") == 1 # problem example 1
assert sol.validSubstringCount("abcabc", "abc") == 10 # problem example 2
assert sol.validSubstringCount("abcabc", "aaabc") == 0 # problem example 3
assert sol.validSubstringCount("a", "a") == 1 # smallest valid case
assert sol.validSubstringCount("a", "b") == 0 # impossible requirement
assert sol.validSubstringCount("aaaa", "aa") == 6 # repeated character requirement
assert sol.validSubstringCount("aaaa", "aaaa") == 1 # whole string only
assert sol.validSubstringCount("abcd", "abcd") == 1 # exact match
assert sol.validSubstringCount("abcd", "abcde") == 0 # word2 longer than word1
assert sol.validSubstringCount("zzzzzz", "z") == 21 # every substring valid
assert sol.validSubstringCount("abca", "aa") == 1 # need two copies of a
assert sol.validSubstringCount("ababab", "aab") == 8 # repeated requirements
assert sol.validSubstringCount("xyz", "abc") == 0 # no required letters present
assert sol.validSubstringCount("abcde", "e") == 5 # all substrings ending at e
Test Summary
| Test | Why |
|---|---|
"bcca", "abc" |
Official example |
"abcabc", "abc" |
Official example with many valid windows |
"abcabc", "aaabc" |
Official impossible case |
"a", "a" |
Minimum valid input |
"a", "b" |
Minimum invalid input |
"aaaa", "aa" |
Repeated character requirement |
"aaaa", "aaaa" |
Entire string needed |
"abcd", "abcd" |
Exact frequency match |
"abcd", "abcde" |
Required length exceeds available length |
"zzzzzz", "z" |
Every substring valid |
"abca", "aa" |
Need multiple copies of one character |
"ababab", "aab" |
Mixed repeated requirements |
"xyz", "abc" |
Missing required characters |
"abcde", "e" |
Single-character target |
Edge Cases
word2 Contains Repeated Characters
A common mistake is checking only whether a character appears, rather than whether it appears enough times.
For example:
word1 = "abca"
word2 = "aa"
The substring must contain two copies of 'a'. A window containing only one 'a' is not valid. The implementation stores exact frequency requirements and only marks a character as satisfied when the required count is reached.
word1 Is Shorter Than word2
Consider:
word1 = "abc"
word2 = "abcd"
No substring can possibly contain all required characters because even the entire string is too short. The sliding window never reaches a valid state, so the answer remains zero.
Required Characters Never Reach Their Target Frequency
Consider:
word1 = "abcabc"
word2 = "aaabc"
The string contains only two 'a' characters, but three are required. The window never satisfies all requirements, so formed never reaches needed. The algorithm naturally returns zero without any special-case logic.
Every Extension Remains Valid
Consider:
word1 = "zzzzzz"
word2 = "z"
As soon as a window contains one 'z', every larger window starting at the same position is also valid. The algorithm exploits this property by adding n - right at once, avoiding the need to enumerate all larger windows individually. This observation is the key to achieving linear time complexity.