LeetCode 2953 - Count Complete Substrings
The problem asks us to count how many substrings of a given string are considered "complete". A substring is complete if it satisfies two conditions simultaneously: 1. Every distinct character in the substring appears exactly k times. 2.
Difficulty: 🔴 Hard
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
The problem asks us to count how many substrings of a given string are considered "complete".
A substring is complete if it satisfies two conditions simultaneously:
- Every distinct character in the substring appears exactly
ktimes. - For every pair of adjacent characters, the absolute difference between their alphabetical positions is at most
2.
The second condition means that for every adjacent pair:
abs(ord(s[i]) - ord(s[i - 1])) <= 2
For example:
'a'and'c'are valid neighbors because the difference is2'a'and'd'are invalid neighbors because the difference is3
The input consists of:
word, a lowercase English stringk, the exact frequency every character must have inside a valid substring
The output is the total number of substrings satisfying both requirements.
The constraints are important:
1 <= word.length <= 100000
This immediately rules out expensive substring enumeration approaches such as checking every substring independently in quadratic or cubic time. We need something close to linear time.
There are two major observations hidden inside the problem:
First, the adjacency condition effectively partitions the string into independent segments. Whenever two adjacent characters differ by more than 2, no valid substring can cross that boundary.
Second, if a substring contains u distinct characters, then its length must be exactly:
u * k
because each of the u characters appears exactly k times.
Since there are only 26 lowercase letters, the number of possible distinct character counts is very small, which makes sliding window enumeration feasible.
Important edge cases include:
- Very short strings
k = 1, where every distinct character only needs to appear once- Strings where every adjacent pair violates the adjacency rule
- Strings with all identical characters
- Windows containing too many copies of one character
- Multiple overlapping valid substrings
The problem guarantees only lowercase English letters, which allows us to use fixed-size frequency arrays of length 26.
Approaches
Brute Force Approach
A straightforward solution is to generate every possible substring and verify whether it is complete.
For each substring:
- Count the frequency of every character.
- Verify that every occurring character appears exactly
ktimes. - Verify that every adjacent pair differs by at most
2.
If the substring satisfies both conditions, increment the answer.
This approach is correct because it explicitly checks the definition of a complete substring.
However, the complexity is far too large. There are O(n^2) substrings, and validating each one may take O(n) time, leading to O(n^3) complexity in the worst case.
Even with prefix optimizations, checking all substrings remains too expensive for n = 100000.
Key Insight
The crucial optimization comes from combining two observations.
The first observation is that the adjacency condition creates natural segment boundaries. If:
abs(word[i] - word[i - 1]) > 2
then no valid substring can include both positions. Therefore, we can split the string into maximal valid segments and process each segment independently.
The second observation is that a valid substring with u distinct characters must have length:
u * k
Since there are only 26 lowercase letters, we only need to try window sizes:
k, 2k, 3k, ..., 26k
For each fixed window size, we use a sliding window with frequency counting.
A window is valid if:
- Every present character appears exactly
ktimes - The number of characters with frequency
kequals the number of distinct characters expected
This reduces the complexity dramatically.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(26) | Enumerates and validates every substring |
| Optimal | O(26 × n) | O(26) | Sliding window on valid segments |
Algorithm Walkthrough
Step 1: Split the string into valid segments
Traverse the string and split it whenever adjacent characters differ by more than 2.
For example:
word = "abcfgh"
Between 'c' and 'f':
abs('f' - 'c') = 3
So the string becomes:
["abc", "fgh"]
No valid substring can cross this boundary.
Step 2: Process each segment independently
Suppose a segment has length m.
Inside this segment, try every possible number of distinct characters:
distinct_count = 1 to 26
The corresponding window length is:
window_size = distinct_count * k
If the window size exceeds the segment length, stop.
Step 3: Use a sliding window
Maintain:
- A frequency array of size
26 - The number of characters currently appearing exactly
ktimes
Expand the window one character at a time.
When adding a character:
- If its old frequency was
k, decrement the matched counter - Increment frequency
- If its new frequency becomes
k, increment the matched counter
Do the same carefully when removing a character.
Step 4: Check window validity
A window is complete if:
- Its length is exactly
distinct_count * k - Exactly
distinct_countcharacters appearktimes
Since the total window size already equals distinct_count * k, this automatically guarantees no extra frequencies exist.
Step 5: Accumulate the answer
Count all valid windows across all segments and return the total.
Why it works
The algorithm works because every complete substring must lie entirely inside one valid adjacency segment.
For each segment, every valid substring must have a length equal to:
(number of distinct characters) * k
The sliding window enumerates all such candidate windows exactly once.
The frequency bookkeeping guarantees that a window is counted only when every occurring character appears exactly k times.
Therefore, every valid substring is counted exactly once, and no invalid substring is included.
Python Solution
class Solution:
def countCompleteSubstrings(self, word: str, k: int) -> int:
n = len(word)
def count_in_segment(segment: str) -> int:
m = len(segment)
total = 0
for distinct_count in range(1, 27):
window_size = distinct_count * k
if window_size > m:
break
freq = [0] * 26
matched = 0
left = 0
for right in range(m):
idx = ord(segment[right]) - ord('a')
if freq[idx] == k:
matched -= 1
freq[idx] += 1
if freq[idx] == k:
matched += 1
if right - left + 1 > window_size:
remove_idx = ord(segment[left]) - ord('a')
if freq[remove_idx] == k:
matched -= 1
freq[remove_idx] -= 1
if freq[remove_idx] == k:
matched += 1
left += 1
if (
right - left + 1 == window_size
and matched == distinct_count
):
total += 1
return total
answer = 0
start = 0
for i in range(1, n):
if abs(ord(word[i]) - ord(word[i - 1])) > 2:
answer += count_in_segment(word[start:i])
start = i
answer += count_in_segment(word[start:])
return answer
The implementation begins by splitting the original string into maximal segments where every adjacent pair satisfies the difference constraint.
The helper function count_in_segment handles one independent segment. Since every valid substring inside the segment must have length distinct_count * k, the function iterates through all possible distinct character counts from 1 to 26.
For each possible window size, a sliding window is maintained using a fixed-size frequency array.
The variable matched tracks how many characters currently appear exactly k times. This allows constant-time validity checks.
When the window grows beyond the target size, the leftmost character is removed and the bookkeeping is updated carefully.
Whenever the window size matches the required size and exactly distinct_count characters have frequency k, the substring is complete and contributes to the answer.
Finally, the counts from all segments are summed together.
Go Solution
func countCompleteSubstrings(word string, k int) int {
n := len(word)
countInSegment := func(segment string) int {
m := len(segment)
total := 0
for distinctCount := 1; distinctCount <= 26; distinctCount++ {
windowSize := distinctCount * k
if windowSize > m {
break
}
freq := make([]int, 26)
matched := 0
left := 0
for right := 0; right < m; right++ {
idx := int(segment[right] - 'a')
if freq[idx] == k {
matched--
}
freq[idx]++
if freq[idx] == k {
matched++
}
if right-left+1 > windowSize {
removeIdx := int(segment[left] - 'a')
if freq[removeIdx] == k {
matched--
}
freq[removeIdx]--
if freq[removeIdx] == k {
matched++
}
left++
}
if right-left+1 == windowSize && matched == distinctCount {
total++
}
}
}
return total
}
answer := 0
start := 0
for i := 1; i < n; i++ {
diff := int(word[i]) - int(word[i-1])
if diff < 0 {
diff = -diff
}
if diff > 2 {
answer += countInSegment(word[start:i])
start = i
}
}
answer += countInSegment(word[start:])
return answer
}
The Go implementation follows the same logic as the Python version.
The frequency table is implemented using a fixed integer slice of length 26.
Since Go does not provide a built-in absolute value function for integers, the adjacent character difference is computed manually.
The sliding window logic and frequency bookkeeping are otherwise identical.
Because the maximum substring length is at most 100000, standard Go integers are sufficient and overflow is not a concern.
Worked Examples
Example 1
word = "igigee"
k = 2
First, check adjacency differences:
i-g = 2
g-i = 2
i-g = 2
g-e = 2
e-e = 0
All are valid, so the entire string is one segment:
"igigee"
Now try different distinct counts.
Distinct Count = 1
Window size:
1 * 2 = 2
Sliding windows:
| Window | Frequencies | Valid |
|---|---|---|
| "ig" | i:1 g:1 | No |
| "gi" | g:1 i:1 | No |
| "ig" | i:1 g:1 | No |
| "ge" | g:1 e:1 | No |
| "ee" | e:2 | Yes |
Count = 1
Distinct Count = 2
Window size:
2 * 2 = 4
| Window | Frequencies | Valid |
|---|---|---|
| "igig" | i:2 g:2 | Yes |
| "gige" | g:2 i:1 e:1 | No |
| "igee" | i:1 g:1 e:2 | No |
Count = 1
Distinct Count = 3
Window size:
3 * 2 = 6
| Window | Frequencies | Valid |
|---|---|---|
| "igigee" | i:2 g:2 e:2 | Yes |
Count = 1
Final answer:
1 + 1 + 1 = 3
Example 2
word = "aaabbbccc"
k = 3
All adjacent differences are valid.
Distinct Count = 1
Window size:
3
Valid windows:
| Window | Valid |
|---|---|
| "aaa" | Yes |
| "aab" | No |
| "abb" | No |
| "bbb" | Yes |
| "bbc" | No |
| "bcc" | No |
| "ccc" | Yes |
Count = 3
Distinct Count = 2
Window size:
6
| Window | Frequencies | Valid |
|---|---|---|
| "aaabbb" | a:3 b:3 | Yes |
| "aabbbc" | a:2 b:3 c:1 | No |
| "abbbcc" | a:1 b:3 c:2 | No |
| "bbbccc" | b:3 c:3 | Yes |
Count = 2
Distinct Count = 3
Window size:
9
| Window | Frequencies | Valid |
|---|---|---|
| "aaabbbccc" | a:3 b:3 c:3 | Yes |
Count = 1
Final answer:
3 + 2 + 1 = 6
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(26 × n) | Each segment is processed for at most 26 window sizes |
| Space | O(26) | Frequency array uses constant space |
The outer loop iterates over at most 26 distinct character counts. For each count, the sliding window scans the segment linearly.
Since all segments together contain exactly n characters, the total complexity becomes:
O(26 * n)
which simplifies to linear time.
The frequency array always contains 26 integers, so the space usage is constant.
Test Cases
sol = Solution()
assert sol.countCompleteSubstrings("igigee", 2) == 3
# Basic example with overlapping valid substrings
assert sol.countCompleteSubstrings("aaabbbccc", 3) == 6
# Multiple valid window sizes
assert sol.countCompleteSubstrings("aaaa", 2) == 3
# Repeated identical character
assert sol.countCompleteSubstrings("abcd", 1) == 10
# Every substring valid because all frequencies are 1
assert sol.countCompleteSubstrings("az", 1) == 2
# Adjacent difference too large, only single letters valid
assert sol.countCompleteSubstrings("a", 1) == 1
# Smallest possible input
assert sol.countCompleteSubstrings("abc", 2) == 0
# Window cannot satisfy exact frequency requirement
assert sol.countCompleteSubstrings("aabbcc", 2) == 6
# Multiple distinct-count configurations
assert sol.countCompleteSubstrings("zzz", 1) == 3
# Every single character valid, larger windows invalid
assert sol.countCompleteSubstrings("abac", 1) == 8
# Distinct single-occurrence windows
Test Summary
| Test | Why |
|---|---|
"igigee", 2 |
Verifies overlapping complete substrings |
"aaabbbccc", 3 |
Verifies multiple valid window sizes |
"aaaa", 2 |
Tests repeated single-character windows |
"abcd", 1 |
Tests all substrings becoming valid |
"az", 1 |
Tests adjacency splitting |
"a", 1 |
Tests minimum input size |
"abc", 2 |
Tests impossible frequency requirement |
"aabbcc", 2 |
Tests multiple valid groupings |
"zzz", 1 |
Tests identical-character edge case |
"abac", 1 |
Tests windows with all unique characters |
Edge Cases
One important edge case occurs when adjacent characters differ by more than 2. For example:
word = "az"
The substring "az" is invalid because:
abs('a' - 'z') = 25
A naive implementation might still consider windows crossing this boundary. The segmentation step prevents this by splitting the string into independent regions.
Another tricky edge case is when the string contains only one repeated character, such as:
word = "aaaaaa"
Here, multiple overlapping windows may satisfy the exact frequency requirement. The sliding window correctly counts every valid occurrence independently.
A third important case occurs when k = 1. In this scenario, every character inside a valid substring must appear exactly once. The algorithm still works because the window size becomes equal to the number of distinct characters, and the frequency tracking naturally enforces uniqueness.
A subtle bug source is handling frequency transitions around k. When a character count changes from k to k + 1, it must stop contributing to the matched counter. Similarly, when removing a character changes its count back to k, it must contribute again. The implementation carefully updates matched before and after every frequency modification to maintain correctness.