LeetCode 340 - Longest Substring with At Most K Distinct Characters

The problem asks us to find the length of the longest contiguous substring in a given string s such that the substring contains at most k distinct characters. A substring is a continuous portion of the string.

LeetCode Problem 340

Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window

Solution

Problem Understanding

The problem asks us to find the length of the longest contiguous substring in a given string s such that the substring contains at most k distinct characters.

A substring is a continuous portion of the string. The important condition is not the total number of characters, but the number of unique character types inside the current substring. For example, in the substring "ece", there are only two distinct characters, 'e' and 'c'.

The input consists of:

  • A string s
  • An integer k

The output should be a single integer representing the maximum possible length of a valid substring.

For example:

  • s = "eceba" and k = 2
  • The longest valid substring is "ece"
  • It contains only two distinct characters
  • Its length is 3

The constraints are important:

  • 1 <= s.length <= 5 * 10^4
  • 0 <= k <= 50

The string can be fairly large, up to 50,000 characters, so any algorithm worse than linear or near linear time will likely be too slow. A brute force solution that checks all substrings would perform poorly because the number of substrings grows quadratically.

There are several edge cases worth identifying early:

  • k = 0, no characters are allowed, so the answer must be 0
  • The string contains fewer than k distinct characters, the entire string is valid
  • The string consists of all identical characters
  • The string alternates between many distinct characters, forcing frequent window shrinking
  • Very small strings, such as length 1

These cases can easily expose bugs in incorrect sliding window implementations.

Approaches

Brute Force Approach

The most straightforward solution is to examine every possible substring and determine whether it contains at most k distinct characters.

We can generate all substrings using two nested loops:

  • Choose a starting index
  • Choose an ending index
  • Count distinct characters in the substring
  • Update the maximum length if the substring is valid

To count distinct characters, we could use a hash set or frequency map.

This approach is correct because it explicitly checks every possible substring, guaranteeing that the longest valid one will eventually be found.

However, it is too slow for the problem constraints. There are O(n^2) substrings, and checking distinct characters may take another O(n) in the worst case, producing an overall complexity as high as O(n^3). Even with incremental counting, the best brute force improvement is still O(n^2).

For n = 50,000, quadratic time is not practical.

Optimal Sliding Window Approach

The key observation is that we do not need to restart from scratch for every substring.

Suppose we already know a valid substring. If we expand it one character to the right:

  • The substring may still be valid
  • Or it may now contain too many distinct characters

If it becomes invalid, we can shrink the left side until it becomes valid again.

This naturally leads to the sliding window technique.

We maintain:

  • A left pointer
  • A right pointer
  • A frequency map storing character counts inside the current window

The window always represents a valid substring with at most k distinct characters.

As we move the right pointer through the string:

  • Add the new character to the frequency map
  • If distinct characters exceed k, move the left pointer forward
  • Remove characters whose count drops to zero
  • Update the maximum window length

Each character enters and leaves the window at most once, giving linear time complexity.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) to O(n³) O(k) or O(n) Checks all substrings explicitly
Optimal O(n) O(k) Sliding window with frequency map

Algorithm Walkthrough

  1. Handle the special case where k == 0.

If no distinct characters are allowed, no non empty substring can be valid. Return 0 immediately. 2. Initialize the sliding window.

Create:

  • left = 0
  • max_length = 0
  • A hash map called char_count

The window will represent the substring s[left:right+1]. 3. Expand the window using the right pointer.

Iterate through the string with right.

For each character:

  • Add it to the frequency map
  • Increment its count
  1. Check whether the window is still valid.

A valid window contains at most k distinct characters.

If len(char_count) > k, the window has become invalid. 5. Shrink the window from the left.

While the number of distinct characters exceeds k:

  • Decrease the count of s[left]
  • If its count becomes zero, remove it from the map
  • Move left forward

This process restores the validity of the window. 6. Update the answer.

Once the window is valid again:

  • Compute the current window length
  • Update max_length
  1. Continue until the right pointer reaches the end.

Since both pointers move only forward, the algorithm runs efficiently in linear time.

Why it works

The algorithm maintains an important invariant:

  • The current window always contains at most k distinct characters after the shrinking step finishes.

At every iteration, the window is the longest valid window ending at the current right index. Since every possible valid window is considered during expansion and shrinking, the maximum recorded length must be the global optimum.

Because each character is added once and removed once, no work is repeated unnecessarily.

Python Solution

class Solution:
    def lengthOfLongestSubstringKDistinct(self, s: str, k: int) -> int:
        if k == 0:
            return 0

        left = 0
        max_length = 0
        char_count = {}

        for right in range(len(s)):
            current_char = s[right]

            char_count[current_char] = char_count.get(current_char, 0) + 1

            while len(char_count) > k:
                left_char = s[left]
                char_count[left_char] -= 1

                if char_count[left_char] == 0:
                    del char_count[left_char]

                left += 1

            current_window_length = right - left + 1
            max_length = max(max_length, current_window_length)

        return max_length

The implementation begins by handling the special case where k is zero. Since no characters are allowed, the answer must immediately be 0.

The variable left tracks the beginning of the sliding window, while the loop variable right expands the window one character at a time.

The dictionary char_count stores the frequency of each character currently inside the window. This allows us to efficiently determine how many distinct characters are present.

Whenever the number of distinct characters exceeds k, the while loop shrinks the window from the left side. Character frequencies are reduced, and entries are removed entirely once their count reaches zero. Removing zero count entries is critical because the dictionary size directly represents the number of distinct characters.

After restoring validity, the algorithm calculates the current window length and updates the best answer.

The implementation directly mirrors the sliding window algorithm described earlier.

Go Solution

func lengthOfLongestSubstringKDistinct(s string, k int) int {
    if k == 0 {
        return 0
    }

    left := 0
    maxLength := 0

    charCount := make(map[byte]int)

    for right := 0; right < len(s); right++ {
        currentChar := s[right]
        charCount[currentChar]++

        for len(charCount) > k {
            leftChar := s[left]
            charCount[leftChar]--

            if charCount[leftChar] == 0 {
                delete(charCount, leftChar)
            }

            left++
        }

        currentWindowLength := right - left + 1

        if currentWindowLength > maxLength {
            maxLength = currentWindowLength
        }
    }

    return maxLength
}

The Go implementation follows the same logic as the Python version. The primary difference is that Go uses an explicit map[byte]int for the frequency table.

Since the input consists of standard ASCII characters in LeetCode's typical constraints, indexing the string as bytes is sufficient and efficient.

Go does not provide a built in max function for integers, so the maximum length update is handled with a standard conditional comparison.

The delete function removes characters whose frequency reaches zero, ensuring that len(charCount) accurately reflects the number of distinct characters in the current window.

Worked Examples

Example 1

Input:

s = "eceba"
k = 2
Step Right Char Window char_count Left Distinct Count Max Length
1 e "e" {e:1} 0 1 1
2 c "ec" {e:1,c:1} 0 2 2
3 e "ece" {e:2,c:1} 0 2 3
4 b "eceb" {e:2,c:1,b:1} 0 3 3

The window is now invalid because it contains 3 distinct characters.

Shrink from the left:

  • Remove 'e', counts become {e:1,c:1,b:1}
  • Still invalid
  • Remove 'c', counts become {e:1,b:1}

Now the window is "eb".

Step Right Char Window char_count Left Distinct Count Max Length
5 a "eba" {e:1,b:1,a:1} 2 3 3

Again invalid.

Shrink:

  • Remove 'e'
  • Window becomes "ba"

Final answer:

3

Example 2

Input:

s = "aa"
k = 1
Step Right Char Window char_count Left Distinct Count Max Length
1 a "a" {a:1} 0 1 1
2 a "aa" {a:2} 0 1 2

The entire string is valid.

Final answer:

2

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character enters and leaves the window at most once
Space O(k) The hash map stores at most k distinct characters

The time complexity is linear because both pointers only move forward. Although there is a nested while loop, the total number of left pointer movements across the entire algorithm is at most n.

The space complexity is proportional to the number of distinct characters currently inside the window. Since the window is restricted to at most k distinct characters, the map size never exceeds k + 1 temporarily.

Test Cases

solution = Solution()

assert solution.lengthOfLongestSubstringKDistinct("eceba", 2) == 3
# Standard example with shrinking window

assert solution.lengthOfLongestSubstringKDistinct("aa", 1) == 2
# Entire string is valid

assert solution.lengthOfLongestSubstringKDistinct("a", 1) == 1
# Single character string

assert solution.lengthOfLongestSubstringKDistinct("a", 0) == 0
# No distinct characters allowed

assert solution.lengthOfLongestSubstringKDistinct("abcabcabc", 2) == 2
# Frequent window shrinking required

assert solution.lengthOfLongestSubstringKDistinct("abaccc", 2) == 4
# Longest substring is "accc"

assert solution.lengthOfLongestSubstringKDistinct("aaaaaa", 1) == 6
# All identical characters

assert solution.lengthOfLongestSubstringKDistinct("abcdef", 10) == 6
# k larger than distinct character count

assert solution.lengthOfLongestSubstringKDistinct("abcadcacacaca", 3) == 11
# Larger mixed pattern

assert solution.lengthOfLongestSubstringKDistinct("", 2) == 0 if False else True
# Empty string is not allowed by constraints, included conceptually
Test Why
"eceba", 2 Standard sliding window behavior
"aa", 1 Entire string valid
"a", 1 Smallest valid input
"a", 0 No characters allowed
"abcabcabc", 2 Continuous shrinking required
"abaccc", 2 Long valid suffix
"aaaaaa", 1 Single distinct character repeated
"abcdef", 10 k larger than total distinct count
"abcadcacacaca", 3 Complex mixed pattern
Empty string conceptual test Defensive thinking beyond constraints

Edge Cases

One important edge case occurs when k = 0. In this scenario, no distinct characters are permitted, meaning every non empty substring is invalid. A naive implementation may incorrectly return 1 because the window briefly contains one character before shrinking. The implementation avoids this by returning 0 immediately before processing begins.

Another important case is when the entire string already satisfies the condition. For example, "aaaaaa" with k = 1 or "abcdef" with k = 10. Some incorrect solutions unnecessarily shrink the window or fail to update the maximum length properly. The sliding window implementation naturally handles this because the window never becomes invalid, allowing it to expand across the entire string.

A third subtle edge case involves characters whose counts drop to zero during shrinking. If those characters are not removed from the hash map, the algorithm will incorrectly believe they are still present in the window. This causes the distinct character count to remain too large, leading to incorrect results or infinite shrinking. The implementation explicitly deletes characters once their frequency reaches zero, ensuring the map accurately reflects the window contents.

Another tricky scenario occurs with rapidly alternating characters such as "abcabcabc" and k = 2. The window repeatedly becomes invalid and must shrink many times. Incorrect implementations sometimes shrink only once instead of continuing until the window becomes valid again. The while loop guarantees that shrinking continues until the invariant is restored.