LeetCode 395 - Longest Substring with At Least K Repeating Characters

The problem asks us to find the length of the longest substring in which every character appears at least k times. A substring is a contiguous sequence of characters inside the original string.

LeetCode Problem 395

Difficulty: 🟡 Medium
Topics: Hash Table, String, Divide and Conquer, Sliding Window

Solution

Problem Understanding

The problem asks us to find the length of the longest substring in which every character appears at least k times.

A substring is a contiguous sequence of characters inside the original string. For any candidate substring, we must verify that every distinct character inside that substring has frequency greater than or equal to k.

For example, if s = "aaabb" and k = 3, the substring "aaa" is valid because 'a' appears exactly 3 times. However, "aaabb" is not valid because 'b' appears only 2 times.

The input consists of:

  • A string s containing only lowercase English letters
  • An integer k

The output is a single integer representing the maximum length of a valid substring.

The constraints are important:

  • 1 <= s.length <= 10^4
  • The string contains only lowercase English letters
  • 1 <= k <= 10^5

The fact that the string contains only lowercase English letters means there are at most 26 distinct characters. This is a major observation that enables efficient solutions.

Several edge cases are important:

  • If k > len(s), no substring can satisfy the condition
  • If every character already appears at least k times, the entire string is valid
  • Strings with many low frequency separator characters can split the search space into smaller independent regions
  • Very small strings, such as length 1, need careful handling

A naive solution that checks every substring would be too slow because there are O(n^2) substrings.

Approaches

Brute Force Approach

The brute force method generates every possible substring and checks whether it is valid.

For each substring:

  1. Count the frequency of every character
  2. Verify that every character frequency is at least k
  3. Update the maximum valid length

This works because it exhaustively examines all possibilities, guaranteeing correctness.

However, there are O(n^2) substrings, and validating each substring may take O(n) time for frequency counting. This leads to an overall complexity of O(n^3), which is too slow for n = 10^4.

Optimal Approach, Divide and Conquer

The key insight is that any character appearing fewer than k times in a substring cannot belong to a valid solution for that substring.

Suppose we analyze some substring and discover that character 'x' appears fewer than k times. Then any substring containing 'x' is automatically invalid.

That means 'x' acts as a separator. We can split the problem around this invalid character and recursively solve the smaller pieces.

For example:

ababbc

with k = 2.

The character 'c' appears only once, so no valid substring can contain 'c'. Therefore we split into:

ababb

and solve recursively.

This dramatically reduces unnecessary work because invalid characters partition the search space.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n³) O(1) Checks every substring independently
Optimal O(26 × n) average, O(n²) worst case O(log n) recursion Uses invalid characters to divide the search space

Algorithm Walkthrough

Divide and Conquer Algorithm

  1. Start with the current substring range.

Instead of physically creating new substrings repeatedly, we recursively process sections of the original string. 2. Count the frequency of every character in the current range.

We use a frequency array of size 26 because the string contains only lowercase English letters. This allows constant time counting and lookup. 3. Identify invalid characters.

Any character whose frequency is greater than 0 but less than k is invalid. Such a character cannot appear in any valid substring within the current range. 4. If no invalid character exists, return the length of the current substring.

This means every character satisfies the requirement, so the entire substring is valid. 5. Otherwise, split the substring around invalid characters.

Every invalid character acts as a boundary. We recursively solve each segment between invalid characters. 6. Return the maximum result among all recursive segments.

Since any valid substring must avoid invalid characters, the best answer must lie completely inside one of these segments.

Why it works

The correctness comes from the fact that a character appearing fewer than k times can never belong to a valid substring covering the current range. Any substring containing that character immediately violates the problem condition.

Therefore, splitting around invalid characters does not discard any valid candidate. Every valid substring must lie entirely inside one of the resulting partitions, so recursively solving those partitions guarantees the optimal answer.

Python Solution

class Solution:
    def longestSubstring(self, s: str, k: int) -> int:
        def divide(left: int, right: int) -> int:
            if right - left < k:
                return 0

            frequency = [0] * 26

            for i in range(left, right):
                frequency[ord(s[i]) - ord('a')] += 1

            for mid in range(left, right):
                if frequency[ord(s[mid]) - ord('a')] < k:
                    mid_next = mid + 1

                    while (
                        mid_next < right
                        and frequency[ord(s[mid_next]) - ord('a')] < k
                    ):
                        mid_next += 1

                    left_part = divide(left, mid)
                    right_part = divide(mid_next, right)

                    return max(left_part, right_part)

            return right - left

        return divide(0, len(s))

The implementation uses a recursive helper function called divide.

The function processes the substring defined by [left, right).

The first optimization checks whether the substring length is smaller than k. If so, it is impossible for any character to appear at least k times, so we immediately return 0.

Next, we count character frequencies inside the current range using a fixed size array of length 26. This is more efficient than a hash map because the alphabet size is fixed.

We then scan the substring looking for invalid characters. If we encounter one, we split the substring around it.

The while loop skips consecutive invalid characters because they cannot contribute to any valid substring.

Finally, we recursively compute the best answer from the left and right partitions and return the maximum.

If no invalid character exists, the entire substring is valid, so we return its length directly.

Go Solution

func longestSubstring(s string, k int) int {
	var divide func(left int, right int) int

	divide = func(left int, right int) int {
		if right-left < k {
			return 0
		}

		frequency := make([]int, 26)

		for i := left; i < right; i++ {
			frequency[s[i]-'a']++
		}

		for mid := left; mid < right; mid++ {
			if frequency[s[mid]-'a'] < k {
				midNext := mid + 1

				for midNext < right &&
					frequency[s[midNext]-'a'] < k {
					midNext++
				}

				leftPart := divide(left, mid)
				rightPart := divide(midNext, right)

				if leftPart > rightPart {
					return leftPart
				}

				return rightPart
			}
		}

		return right - left
	}

	return divide(0, len(s))
}

The Go implementation closely mirrors the Python version.

Instead of nested functions with automatic closures like Python, Go requires explicitly declaring the recursive function variable before assigning the function body.

Character indexing is simpler in Go because strings can be indexed directly as bytes, and all characters are lowercase English letters.

The frequency array uses make([]int, 26) for efficient fixed size counting.

No integer overflow concerns exist because the maximum string length is only 10^4.

Worked Examples

Example 1

Input:

s = "aaabb"
k = 3

Initial substring:

aaabb

Frequency table:

Character Count
a 3
b 2

Character 'b' is invalid because its frequency is less than k.

We split around 'b'.

Substrings produced:

Substring Valid?
"aaa" Yes
"" No
"" No

Recursive evaluation:

Substring Result
"aaa" 3
"" 0
"" 0

Maximum result:

3

Example 2

Input:

s = "ababbc"
k = 2

Frequency table:

Character Count
a 2
b 3
c 1

Character 'c' is invalid.

Split around 'c'.

Substrings:

Substring Valid?
"ababb" Yes
"" No

Now evaluate "ababb".

Frequency table:

Character Count
a 2
b 3

All characters satisfy the condition.

Return:

5

Complexity Analysis

Measure Complexity Explanation
Time O(26 × n) average, O(n²) worst case Each recursive level scans substring ranges
Space O(log n) average recursion depth Recursive call stack

The average case performs very well because invalid characters rapidly partition the string into smaller sections.

In the worst case, recursion may repeatedly split off very small portions, leading to quadratic behavior. However, because the alphabet is limited to 26 characters, practical performance is excellent.

The auxiliary space comes primarily from recursion depth.

Test Cases

solution = Solution()

assert solution.longestSubstring("aaabb", 3) == 3
# Basic example with one invalid character

assert solution.longestSubstring("ababbc", 2) == 5
# Entire prefix is valid

assert solution.longestSubstring("a", 1) == 1
# Single character valid

assert solution.longestSubstring("a", 2) == 0
# Single character invalid

assert solution.longestSubstring("aaaaa", 2) == 5
# Entire string valid

assert solution.longestSubstring("abcde", 2) == 0
# No character repeats enough

assert solution.longestSubstring("aaabbbcdefggg", 3) == 6
# Multiple partitions

assert solution.longestSubstring("ababacb", 3) == 0
# No valid substring exists

assert solution.longestSubstring("bbaaacbd", 3) == 3
# Best substring in middle

assert solution.longestSubstring("", 1) == 0
# Empty string edge case if tested outside constraints
Test Why
"aaabb", 3 Standard example
"ababbc", 2 Valid large prefix
"a", 1 Smallest valid input
"a", 2 Smallest invalid input
"aaaaa", 2 Entire string valid
"abcde", 2 No valid substring
"aaabbbcdefggg", 3 Multiple recursive splits
"ababacb", 3 All candidates invalid
"bbaaacbd", 3 Valid substring in center
"", 1 Defensive empty input handling

Edge Cases

One important edge case occurs when k is larger than the string length. In this scenario, no substring can possibly satisfy the requirement because even the entire string is too short. The implementation handles this naturally through the condition:

if right - left < k:
    return 0

Another tricky case occurs when every character already satisfies the requirement. For example:

s = "aaabbb"
k = 3

A buggy implementation might unnecessarily recurse or split. Our solution correctly detects that no invalid character exists and immediately returns the entire substring length.

A third important edge case involves consecutive invalid characters. Consider:

s = "aaaxxxbbb"
k = 3

The characters 'x' are all invalid. If we split one character at a time, we create unnecessary recursive calls. The implementation avoids this by skipping consecutive invalid characters with the while loop:

while mid_next < right and frequency[...] < k:
    mid_next += 1

This improves efficiency and avoids redundant recursion.

Another subtle case occurs when the valid substring is in the middle of the string. For example:

s = "caaab"
k = 3

The valid answer is "aaa". The divide and conquer strategy correctly isolates and evaluates middle sections independently, ensuring the optimal substring is not missed.