LeetCode 395 - Longest Substring with At Least K Repeating Characters
The problem asks us to find the length of the longest substring in which every character appears at least k times. A substring is a contiguous sequence of characters inside the original string.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Divide and Conquer, Sliding Window
Solution
Problem Understanding
The problem asks us to find the length of the longest substring in which every character appears at least k times.
A substring is a contiguous sequence of characters inside the original string. For any candidate substring, we must verify that every distinct character inside that substring has frequency greater than or equal to k.
For example, if s = "aaabb" and k = 3, the substring "aaa" is valid because 'a' appears exactly 3 times. However, "aaabb" is not valid because 'b' appears only 2 times.
The input consists of:
- A string
scontaining only lowercase English letters - An integer
k
The output is a single integer representing the maximum length of a valid substring.
The constraints are important:
1 <= s.length <= 10^4- The string contains only lowercase English letters
1 <= k <= 10^5
The fact that the string contains only lowercase English letters means there are at most 26 distinct characters. This is a major observation that enables efficient solutions.
Several edge cases are important:
- If
k > len(s), no substring can satisfy the condition - If every character already appears at least
ktimes, the entire string is valid - Strings with many low frequency separator characters can split the search space into smaller independent regions
- Very small strings, such as length 1, need careful handling
A naive solution that checks every substring would be too slow because there are O(n^2) substrings.
Approaches
Brute Force Approach
The brute force method generates every possible substring and checks whether it is valid.
For each substring:
- Count the frequency of every character
- Verify that every character frequency is at least
k - Update the maximum valid length
This works because it exhaustively examines all possibilities, guaranteeing correctness.
However, there are O(n^2) substrings, and validating each substring may take O(n) time for frequency counting. This leads to an overall complexity of O(n^3), which is too slow for n = 10^4.
Optimal Approach, Divide and Conquer
The key insight is that any character appearing fewer than k times in a substring cannot belong to a valid solution for that substring.
Suppose we analyze some substring and discover that character 'x' appears fewer than k times. Then any substring containing 'x' is automatically invalid.
That means 'x' acts as a separator. We can split the problem around this invalid character and recursively solve the smaller pieces.
For example:
ababbc
with k = 2.
The character 'c' appears only once, so no valid substring can contain 'c'. Therefore we split into:
ababb
and solve recursively.
This dramatically reduces unnecessary work because invalid characters partition the search space.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(1) | Checks every substring independently |
| Optimal | O(26 × n) average, O(n²) worst case | O(log n) recursion | Uses invalid characters to divide the search space |
Algorithm Walkthrough
Divide and Conquer Algorithm
- Start with the current substring range.
Instead of physically creating new substrings repeatedly, we recursively process sections of the original string. 2. Count the frequency of every character in the current range.
We use a frequency array of size 26 because the string contains only lowercase English letters. This allows constant time counting and lookup. 3. Identify invalid characters.
Any character whose frequency is greater than 0 but less than k is invalid. Such a character cannot appear in any valid substring within the current range.
4. If no invalid character exists, return the length of the current substring.
This means every character satisfies the requirement, so the entire substring is valid. 5. Otherwise, split the substring around invalid characters.
Every invalid character acts as a boundary. We recursively solve each segment between invalid characters. 6. Return the maximum result among all recursive segments.
Since any valid substring must avoid invalid characters, the best answer must lie completely inside one of these segments.
Why it works
The correctness comes from the fact that a character appearing fewer than k times can never belong to a valid substring covering the current range. Any substring containing that character immediately violates the problem condition.
Therefore, splitting around invalid characters does not discard any valid candidate. Every valid substring must lie entirely inside one of the resulting partitions, so recursively solving those partitions guarantees the optimal answer.
Python Solution
class Solution:
def longestSubstring(self, s: str, k: int) -> int:
def divide(left: int, right: int) -> int:
if right - left < k:
return 0
frequency = [0] * 26
for i in range(left, right):
frequency[ord(s[i]) - ord('a')] += 1
for mid in range(left, right):
if frequency[ord(s[mid]) - ord('a')] < k:
mid_next = mid + 1
while (
mid_next < right
and frequency[ord(s[mid_next]) - ord('a')] < k
):
mid_next += 1
left_part = divide(left, mid)
right_part = divide(mid_next, right)
return max(left_part, right_part)
return right - left
return divide(0, len(s))
The implementation uses a recursive helper function called divide.
The function processes the substring defined by [left, right).
The first optimization checks whether the substring length is smaller than k. If so, it is impossible for any character to appear at least k times, so we immediately return 0.
Next, we count character frequencies inside the current range using a fixed size array of length 26. This is more efficient than a hash map because the alphabet size is fixed.
We then scan the substring looking for invalid characters. If we encounter one, we split the substring around it.
The while loop skips consecutive invalid characters because they cannot contribute to any valid substring.
Finally, we recursively compute the best answer from the left and right partitions and return the maximum.
If no invalid character exists, the entire substring is valid, so we return its length directly.
Go Solution
func longestSubstring(s string, k int) int {
var divide func(left int, right int) int
divide = func(left int, right int) int {
if right-left < k {
return 0
}
frequency := make([]int, 26)
for i := left; i < right; i++ {
frequency[s[i]-'a']++
}
for mid := left; mid < right; mid++ {
if frequency[s[mid]-'a'] < k {
midNext := mid + 1
for midNext < right &&
frequency[s[midNext]-'a'] < k {
midNext++
}
leftPart := divide(left, mid)
rightPart := divide(midNext, right)
if leftPart > rightPart {
return leftPart
}
return rightPart
}
}
return right - left
}
return divide(0, len(s))
}
The Go implementation closely mirrors the Python version.
Instead of nested functions with automatic closures like Python, Go requires explicitly declaring the recursive function variable before assigning the function body.
Character indexing is simpler in Go because strings can be indexed directly as bytes, and all characters are lowercase English letters.
The frequency array uses make([]int, 26) for efficient fixed size counting.
No integer overflow concerns exist because the maximum string length is only 10^4.
Worked Examples
Example 1
Input:
s = "aaabb"
k = 3
Initial substring:
aaabb
Frequency table:
| Character | Count |
|---|---|
| a | 3 |
| b | 2 |
Character 'b' is invalid because its frequency is less than k.
We split around 'b'.
Substrings produced:
| Substring | Valid? |
|---|---|
| "aaa" | Yes |
| "" | No |
| "" | No |
Recursive evaluation:
| Substring | Result |
|---|---|
| "aaa" | 3 |
| "" | 0 |
| "" | 0 |
Maximum result:
3
Example 2
Input:
s = "ababbc"
k = 2
Frequency table:
| Character | Count |
|---|---|
| a | 2 |
| b | 3 |
| c | 1 |
Character 'c' is invalid.
Split around 'c'.
Substrings:
| Substring | Valid? |
|---|---|
| "ababb" | Yes |
| "" | No |
Now evaluate "ababb".
Frequency table:
| Character | Count |
|---|---|
| a | 2 |
| b | 3 |
All characters satisfy the condition.
Return:
5
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(26 × n) average, O(n²) worst case | Each recursive level scans substring ranges |
| Space | O(log n) average recursion depth | Recursive call stack |
The average case performs very well because invalid characters rapidly partition the string into smaller sections.
In the worst case, recursion may repeatedly split off very small portions, leading to quadratic behavior. However, because the alphabet is limited to 26 characters, practical performance is excellent.
The auxiliary space comes primarily from recursion depth.
Test Cases
solution = Solution()
assert solution.longestSubstring("aaabb", 3) == 3
# Basic example with one invalid character
assert solution.longestSubstring("ababbc", 2) == 5
# Entire prefix is valid
assert solution.longestSubstring("a", 1) == 1
# Single character valid
assert solution.longestSubstring("a", 2) == 0
# Single character invalid
assert solution.longestSubstring("aaaaa", 2) == 5
# Entire string valid
assert solution.longestSubstring("abcde", 2) == 0
# No character repeats enough
assert solution.longestSubstring("aaabbbcdefggg", 3) == 6
# Multiple partitions
assert solution.longestSubstring("ababacb", 3) == 0
# No valid substring exists
assert solution.longestSubstring("bbaaacbd", 3) == 3
# Best substring in middle
assert solution.longestSubstring("", 1) == 0
# Empty string edge case if tested outside constraints
| Test | Why |
|---|---|
"aaabb", 3 |
Standard example |
"ababbc", 2 |
Valid large prefix |
"a", 1 |
Smallest valid input |
"a", 2 |
Smallest invalid input |
"aaaaa", 2 |
Entire string valid |
"abcde", 2 |
No valid substring |
"aaabbbcdefggg", 3 |
Multiple recursive splits |
"ababacb", 3 |
All candidates invalid |
"bbaaacbd", 3 |
Valid substring in center |
"", 1 |
Defensive empty input handling |
Edge Cases
One important edge case occurs when k is larger than the string length. In this scenario, no substring can possibly satisfy the requirement because even the entire string is too short. The implementation handles this naturally through the condition:
if right - left < k:
return 0
Another tricky case occurs when every character already satisfies the requirement. For example:
s = "aaabbb"
k = 3
A buggy implementation might unnecessarily recurse or split. Our solution correctly detects that no invalid character exists and immediately returns the entire substring length.
A third important edge case involves consecutive invalid characters. Consider:
s = "aaaxxxbbb"
k = 3
The characters 'x' are all invalid. If we split one character at a time, we create unnecessary recursive calls. The implementation avoids this by skipping consecutive invalid characters with the while loop:
while mid_next < right and frequency[...] < k:
mid_next += 1
This improves efficiency and avoids redundant recursion.
Another subtle case occurs when the valid substring is in the middle of the string. For example:
s = "caaab"
k = 3
The valid answer is "aaa". The divide and conquer strategy correctly isolates and evaluates middle sections independently, ensuring the optimal substring is not missed.