LeetCode 340 - Longest Substring with At Most K Distinct Characters
The problem asks us to find the length of the longest contiguous substring in a given string s such that the substring contains at most k distinct characters. A substring is a continuous portion of the string.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
The problem asks us to find the length of the longest contiguous substring in a given string s such that the substring contains at most k distinct characters.
A substring is a continuous portion of the string. The important condition is not the total number of characters, but the number of unique character types inside the current substring. For example, in the substring "ece", there are only two distinct characters, 'e' and 'c'.
The input consists of:
- A string
s - An integer
k
The output should be a single integer representing the maximum possible length of a valid substring.
For example:
s = "eceba"andk = 2- The longest valid substring is
"ece" - It contains only two distinct characters
- Its length is
3
The constraints are important:
1 <= s.length <= 5 * 10^40 <= k <= 50
The string can be fairly large, up to 50,000 characters, so any algorithm worse than linear or near linear time will likely be too slow. A brute force solution that checks all substrings would perform poorly because the number of substrings grows quadratically.
There are several edge cases worth identifying early:
k = 0, no characters are allowed, so the answer must be0- The string contains fewer than
kdistinct characters, the entire string is valid - The string consists of all identical characters
- The string alternates between many distinct characters, forcing frequent window shrinking
- Very small strings, such as length
1
These cases can easily expose bugs in incorrect sliding window implementations.
Approaches
Brute Force Approach
The most straightforward solution is to examine every possible substring and determine whether it contains at most k distinct characters.
We can generate all substrings using two nested loops:
- Choose a starting index
- Choose an ending index
- Count distinct characters in the substring
- Update the maximum length if the substring is valid
To count distinct characters, we could use a hash set or frequency map.
This approach is correct because it explicitly checks every possible substring, guaranteeing that the longest valid one will eventually be found.
However, it is too slow for the problem constraints. There are O(n^2) substrings, and checking distinct characters may take another O(n) in the worst case, producing an overall complexity as high as O(n^3). Even with incremental counting, the best brute force improvement is still O(n^2).
For n = 50,000, quadratic time is not practical.
Optimal Sliding Window Approach
The key observation is that we do not need to restart from scratch for every substring.
Suppose we already know a valid substring. If we expand it one character to the right:
- The substring may still be valid
- Or it may now contain too many distinct characters
If it becomes invalid, we can shrink the left side until it becomes valid again.
This naturally leads to the sliding window technique.
We maintain:
- A left pointer
- A right pointer
- A frequency map storing character counts inside the current window
The window always represents a valid substring with at most k distinct characters.
As we move the right pointer through the string:
- Add the new character to the frequency map
- If distinct characters exceed
k, move the left pointer forward - Remove characters whose count drops to zero
- Update the maximum window length
Each character enters and leaves the window at most once, giving linear time complexity.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) to O(n³) | O(k) or O(n) | Checks all substrings explicitly |
| Optimal | O(n) | O(k) | Sliding window with frequency map |
Algorithm Walkthrough
- Handle the special case where
k == 0.
If no distinct characters are allowed, no non empty substring can be valid. Return 0 immediately.
2. Initialize the sliding window.
Create:
left = 0max_length = 0- A hash map called
char_count
The window will represent the substring s[left:right+1].
3. Expand the window using the right pointer.
Iterate through the string with right.
For each character:
- Add it to the frequency map
- Increment its count
- Check whether the window is still valid.
A valid window contains at most k distinct characters.
If len(char_count) > k, the window has become invalid.
5. Shrink the window from the left.
While the number of distinct characters exceeds k:
- Decrease the count of
s[left] - If its count becomes zero, remove it from the map
- Move
leftforward
This process restores the validity of the window. 6. Update the answer.
Once the window is valid again:
- Compute the current window length
- Update
max_length
- Continue until the right pointer reaches the end.
Since both pointers move only forward, the algorithm runs efficiently in linear time.
Why it works
The algorithm maintains an important invariant:
- The current window always contains at most
kdistinct characters after the shrinking step finishes.
At every iteration, the window is the longest valid window ending at the current right index. Since every possible valid window is considered during expansion and shrinking, the maximum recorded length must be the global optimum.
Because each character is added once and removed once, no work is repeated unnecessarily.
Python Solution
class Solution:
def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
if k == 0:
return 0
left = 0
max_length = 0
char_count = {}
for right in range(len(s)):
current_char = s[right]
char_count[current_char] = char_count.get(current_char, 0) + 1
while len(char_count) > k:
left_char = s[left]
char_count[left_char] -= 1
if char_count[left_char] == 0:
del char_count[left_char]
left += 1
current_window_length = right - left + 1
max_length = max(max_length, current_window_length)
return max_length
The implementation begins by handling the special case where k is zero. Since no characters are allowed, the answer must immediately be 0.
The variable left tracks the beginning of the sliding window, while the loop variable right expands the window one character at a time.
The dictionary char_count stores the frequency of each character currently inside the window. This allows us to efficiently determine how many distinct characters are present.
Whenever the number of distinct characters exceeds k, the while loop shrinks the window from the left side. Character frequencies are reduced, and entries are removed entirely once their count reaches zero. Removing zero count entries is critical because the dictionary size directly represents the number of distinct characters.
After restoring validity, the algorithm calculates the current window length and updates the best answer.
The implementation directly mirrors the sliding window algorithm described earlier.
Go Solution
func lengthOfLongestSubstringKDistinct(s string, k int) int {
if k == 0 {
return 0
}
left := 0
maxLength := 0
charCount := make(map[byte]int)
for right := 0; right < len(s); right++ {
currentChar := s[right]
charCount[currentChar]++
for len(charCount) > k {
leftChar := s[left]
charCount[leftChar]--
if charCount[leftChar] == 0 {
delete(charCount, leftChar)
}
left++
}
currentWindowLength := right - left + 1
if currentWindowLength > maxLength {
maxLength = currentWindowLength
}
}
return maxLength
}
The Go implementation follows the same logic as the Python version. The primary difference is that Go uses an explicit map[byte]int for the frequency table.
Since the input consists of standard ASCII characters in LeetCode's typical constraints, indexing the string as bytes is sufficient and efficient.
Go does not provide a built in max function for integers, so the maximum length update is handled with a standard conditional comparison.
The delete function removes characters whose frequency reaches zero, ensuring that len(charCount) accurately reflects the number of distinct characters in the current window.
Worked Examples
Example 1
Input:
s = "eceba"
k = 2
| Step | Right Char | Window | char_count | Left | Distinct Count | Max Length |
|---|---|---|---|---|---|---|
| 1 | e | "e" | {e:1} | 0 | 1 | 1 |
| 2 | c | "ec" | {e:1,c:1} | 0 | 2 | 2 |
| 3 | e | "ece" | {e:2,c:1} | 0 | 2 | 3 |
| 4 | b | "eceb" | {e:2,c:1,b:1} | 0 | 3 | 3 |
The window is now invalid because it contains 3 distinct characters.
Shrink from the left:
- Remove
'e', counts become{e:1,c:1,b:1} - Still invalid
- Remove
'c', counts become{e:1,b:1}
Now the window is "eb".
| Step | Right Char | Window | char_count | Left | Distinct Count | Max Length |
|---|---|---|---|---|---|---|
| 5 | a | "eba" | {e:1,b:1,a:1} | 2 | 3 | 3 |
Again invalid.
Shrink:
- Remove
'e' - Window becomes
"ba"
Final answer:
3
Example 2
Input:
s = "aa"
k = 1
| Step | Right Char | Window | char_count | Left | Distinct Count | Max Length |
|---|---|---|---|---|---|---|
| 1 | a | "a" | {a:1} | 0 | 1 | 1 |
| 2 | a | "aa" | {a:2} | 0 | 1 | 2 |
The entire string is valid.
Final answer:
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character enters and leaves the window at most once |
| Space | O(k) | The hash map stores at most k distinct characters |
The time complexity is linear because both pointers only move forward. Although there is a nested while loop, the total number of left pointer movements across the entire algorithm is at most n.
The space complexity is proportional to the number of distinct characters currently inside the window. Since the window is restricted to at most k distinct characters, the map size never exceeds k + 1 temporarily.
Test Cases
solution = Solution()
assert solution.lengthOfLongestSubstringKDistinct("eceba", 2) == 3
# Standard example with shrinking window
assert solution.lengthOfLongestSubstringKDistinct("aa", 1) == 2
# Entire string is valid
assert solution.lengthOfLongestSubstringKDistinct("a", 1) == 1
# Single character string
assert solution.lengthOfLongestSubstringKDistinct("a", 0) == 0
# No distinct characters allowed
assert solution.lengthOfLongestSubstringKDistinct("abcabcabc", 2) == 2
# Frequent window shrinking required
assert solution.lengthOfLongestSubstringKDistinct("abaccc", 2) == 4
# Longest substring is "accc"
assert solution.lengthOfLongestSubstringKDistinct("aaaaaa", 1) == 6
# All identical characters
assert solution.lengthOfLongestSubstringKDistinct("abcdef", 10) == 6
# k larger than distinct character count
assert solution.lengthOfLongestSubstringKDistinct("abcadcacacaca", 3) == 11
# Larger mixed pattern
assert solution.lengthOfLongestSubstringKDistinct("", 2) == 0 if False else True
# Empty string is not allowed by constraints, included conceptually
| Test | Why |
|---|---|
"eceba", 2 |
Standard sliding window behavior |
"aa", 1 |
Entire string valid |
"a", 1 |
Smallest valid input |
"a", 0 |
No characters allowed |
"abcabcabc", 2 |
Continuous shrinking required |
"abaccc", 2 |
Long valid suffix |
"aaaaaa", 1 |
Single distinct character repeated |
"abcdef", 10 |
k larger than total distinct count |
"abcadcacacaca", 3 |
Complex mixed pattern |
| Empty string conceptual test | Defensive thinking beyond constraints |
Edge Cases
One important edge case occurs when k = 0. In this scenario, no distinct characters are permitted, meaning every non empty substring is invalid. A naive implementation may incorrectly return 1 because the window briefly contains one character before shrinking. The implementation avoids this by returning 0 immediately before processing begins.
Another important case is when the entire string already satisfies the condition. For example, "aaaaaa" with k = 1 or "abcdef" with k = 10. Some incorrect solutions unnecessarily shrink the window or fail to update the maximum length properly. The sliding window implementation naturally handles this because the window never becomes invalid, allowing it to expand across the entire string.
A third subtle edge case involves characters whose counts drop to zero during shrinking. If those characters are not removed from the hash map, the algorithm will incorrectly believe they are still present in the window. This causes the distinct character count to remain too large, leading to incorrect results or infinite shrinking. The implementation explicitly deletes characters once their frequency reaches zero, ensuring the map accurately reflects the window contents.
Another tricky scenario occurs with rapidly alternating characters such as "abcabcabc" and k = 2. The window repeatedly becomes invalid and must shrink many times. Incorrect implementations sometimes shrink only once instead of continuing until the window becomes valid again. The while loop guarantees that shrinking continues until the invariant is restored.