LeetCode 159 - Longest Substring with At Most Two Distinct Characters
The problem asks us to find the length of the longest contiguous substring in a string s that contains at most two distinct characters. A substring is a continuous sequence of characters inside the original string.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Sliding Window
Solution
Problem Understanding
The problem asks us to find the length of the longest contiguous substring in a string s that contains at most two distinct characters.
A substring is a continuous sequence of characters inside the original string. The phrase "at most two distinct characters" means the substring may contain one unique character or two unique characters, but never three or more.
For example, in the string "eceba", the substring "ece" contains only the characters 'e' and 'c', so it is valid. Its length is 3, which is the maximum valid substring length in that input.
The input is a single string s, and the output is an integer representing the maximum length of a valid substring.
The constraints are important:
1 <= s.length <= 10^5- The string contains only English letters.
A string length of up to 100,000 means an inefficient solution will not pass. Any algorithm that repeatedly scans large portions of the string, such as an O(n^2) or O(n^3) solution, becomes too slow for the worst case. This strongly suggests we need a linear time solution.
Several edge cases are important to think about before designing the algorithm:
- A string containing only one character, such as
"a". - A string where all characters are the same, such as
"aaaaa". - A string containing exactly two distinct characters, such as
"ababab". - A string where a third distinct character appears frequently and forces the window to shrink often.
- Very short strings, especially length
1and2.
The problem guarantees the string is non-empty, so we never need to handle a completely empty input.
Approaches
Brute Force Approach
A straightforward solution is to generate every possible substring and check whether it contains at most two distinct characters.
For every starting index i, we can expand to every ending index j, forming the substring s[i:j+1]. For each substring, we count how many unique characters it contains. If the count is less than or equal to 2, we update the maximum length.
This approach is correct because it exhaustively checks every possible substring, so it cannot miss the optimal answer.
However, it is too slow for large inputs. There are O(n^2) substrings, and checking distinct characters may take up to O(n) time in the worst case. Even with optimization, the brute force approach remains quadratic, which is too expensive for n = 10^5.
Optimal Sliding Window Approach
The key observation is that we only care about contiguous substrings with at most two distinct characters. This is a classic situation where a sliding window works well.
Instead of recomputing every substring from scratch, we maintain a window [left, right] that always satisfies the condition of containing at most two distinct characters.
As we move right through the string:
- We add the new character into the window.
- If the window becomes invalid, meaning it now contains more than two distinct characters, we move
leftforward until the window becomes valid again. - At every step, we compute the window length and update the answer.
A hash map is used to track the frequency of characters currently inside the window. This lets us efficiently determine how many distinct characters exist in the current substring.
Because each character enters and leaves the window at most once, the total runtime is linear.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) to O(n) | Checks all substrings explicitly |
| Optimal | O(n) | O(1) | Sliding window with character frequency map |
Algorithm Walkthrough
-
Initialize two pointers,
leftandright, to represent the current sliding window. Both start at0. -
Create a hash map called
char_countto store the frequency of each character currently inside the window. -
Initialize a variable
max_lengthto store the best answer found so far. -
Iterate through the string using
rightas the expanding pointer. -
For each character
s[right], add it to the hash map by increasing its frequency count. -
After adding the character, check whether the window now contains more than two distinct characters. This is determined by checking the size of the hash map.
-
If there are more than two distinct characters, the window is invalid. To restore validity:
-
Decrease the count of
s[left]. -
If its frequency becomes zero, remove it from the hash map entirely.
-
Move
leftone step to the right. -
Continue shrinking until the window again contains at most two distinct characters.
-
Once the window is valid, compute its length using:
right - left + 1
9. Update max_length if the current window is larger than any previously seen valid window.
10. Continue until right reaches the end of the string.
11. Return max_length.
Why it works
The sliding window always maintains the invariant that the current substring contains at most two distinct characters.
Whenever the window becomes invalid, we shrink it just enough to restore validity. Since we examine every possible valid window during expansion and shrinking, we never miss the optimal substring.
Because each index moves forward at most once, the algorithm runs in linear time.
Python Solution
class Solution:
def lengthOfLongestSubstringTwoDistinct(self, s: str) -> int:
left = 0
max_length = 0
char_count = {}
for right in range(len(s)):
current_char = s[right]
char_count[current_char] = char_count.get(current_char, 0) + 1
while len(char_count) > 2:
left_char = s[left]
char_count[left_char] -= 1
if char_count[left_char] == 0:
del char_count[left_char]
left += 1
current_window_length = right - left + 1
max_length = max(max_length, current_window_length)
return max_length
The implementation begins by initializing the sliding window boundaries and the hash map used for frequency counting.
The for loop expands the window one character at a time using the right pointer. Every newly included character is added to char_count.
The while loop ensures the window remains valid. If more than two distinct characters exist, the left side of the window is contracted until only two distinct characters remain.
When shrinking the window, the count of the leftmost character is decreased. If its count becomes zero, that character is completely removed from the map. This keeps the number of keys in the map equal to the number of distinct characters currently inside the window.
After ensuring validity, the current window length is computed and compared with the best result seen so far.
Finally, the maximum valid substring length is returned.
Go Solution
func lengthOfLongestSubstringTwoDistinct(s string) int {
left := 0
maxLength := 0
charCount := make(map[byte]int)
for right := 0; right < len(s); right++ {
currentChar := s[right]
charCount[currentChar]++
for len(charCount) > 2 {
leftChar := s[left]
charCount[leftChar]--
if charCount[leftChar] == 0 {
delete(charCount, leftChar)
}
left++
}
currentWindowLength := right - left + 1
if currentWindowLength > maxLength {
maxLength = currentWindowLength
}
}
return maxLength
}
The Go implementation follows the same sliding window logic as the Python version.
One implementation difference is that Go strings are indexed as bytes. Since the problem guarantees English letters only, using byte is completely safe and efficient.
The frequency map is implemented using map[byte]int.
Go does not provide a built in max function for integers, so the comparison is done manually with an if statement.
No special handling for empty strings is necessary because the constraints guarantee the string length is at least 1.
Worked Examples
Example 1
Input:
s = "eceba"
Step by step execution:
| Right | Character | Window | char_count | Valid? | max_length |
|---|---|---|---|---|---|
| 0 | e | "e" | {e:1} | Yes | 1 |
| 1 | c | "ec" | {e:1, c:1} | Yes | 2 |
| 2 | e | "ece" | {e:2, c:1} | Yes | 3 |
| 3 | b | "eceb" | {e:2, c:1, b:1} | No | 3 |
Now the window is invalid because it contains three distinct characters.
Shrink from the left:
| Action | Window | char_count |
|---|---|---|
| Remove e | "ceb" | {e:1, c:1, b:1} |
| Remove c | "eb" | {e:1, b:1} |
The window becomes valid again.
Continue:
| Right | Character | Window | char_count | Valid? | max_length |
|---|---|---|---|---|---|
| 4 | a | "eba" | {e:1, b:1, a:1} | No | 3 |
Shrink again until valid:
| Action | Window | char_count |
|---|---|---|
| Remove e | "ba" | {b:1, a:1} |
Final answer:
3
Example 2
Input:
s = "ccaabbb"
Step by step execution:
| Right | Character | Window | char_count | max_length |
|---|---|---|---|---|
| 0 | c | "c" | {c:1} | 1 |
| 1 | c | "cc" | {c:2} | 2 |
| 2 | a | "cca" | {c:2, a:1} | 3 |
| 3 | a | "ccaa" | {c:2, a:2} | 4 |
| 4 | b | "ccaab" | {c:2, a:2, b:1} | 4 |
Invalid window, shrink:
| Action | Window | char_count |
|---|---|---|
| Remove c | "caab" | {c:1, a:2, b:1} |
| Remove c | "aab" | {a:2, b:1} |
Now valid again.
Continue:
| Right | Character | Window | char_count | max_length |
|---|---|---|---|---|
| 5 | b | "aabb" | {a:2, b:2} | 4 |
| 6 | b | "aabbb" | {a:2, b:3} | 5 |
Final answer:
5
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is added and removed from the window at most once |
| Space | O(1) | The hash map stores at most three distinct characters |
The runtime is linear because both pointers only move forward. Even though there is a nested while loop, the total number of left pointer movements across the entire algorithm is at most n.
The space complexity is constant because the problem only allows English letters, and the sliding window never stores more than a few distinct characters at once.
Test Cases
solution = Solution()
assert solution.lengthOfLongestSubstringTwoDistinct("eceba") == 3
# Standard example with shrinking window
assert solution.lengthOfLongestSubstringTwoDistinct("ccaabbb") == 5
# Standard example with long valid suffix
assert solution.lengthOfLongestSubstringTwoDistinct("a") == 1
# Single character input
assert solution.lengthOfLongestSubstringTwoDistinct("aa") == 2
# All characters identical
assert solution.lengthOfLongestSubstringTwoDistinct("ab") == 2
# Exactly two distinct characters
assert solution.lengthOfLongestSubstringTwoDistinct("abc") == 2
# Third character immediately invalidates window
assert solution.lengthOfLongestSubstringTwoDistinct("abababab") == 8
# Entire string valid with two alternating characters
assert solution.lengthOfLongestSubstringTwoDistinct("abcabcabc") == 2
# Constant invalidation by third character
assert solution.lengthOfLongestSubstringTwoDistinct("eceecc") == 6
# Entire string valid
assert solution.lengthOfLongestSubstringTwoDistinct("aabbcc") == 4
# Best substring occurs before end
assert solution.lengthOfLongestSubstringTwoDistinct("ababffzzeee") == 5
# Multiple shrinking and expansion phases
| Test | Why |
|---|---|
"eceba" |
Basic example with shrinking window |
"ccaabbb" |
Long valid suffix |
"a" |
Minimum input size |
"aa" |
Single distinct character |
"ab" |
Exactly two distinct characters |
"abc" |
Immediate invalidation |
"abababab" |
Entire string valid |
"abcabcabc" |
Frequent window resets |
"eceecc" |
Entire string valid with repeated counts |
"aabbcc" |
Best answer not at the end |
"ababffzzeee" |
Complex sliding behavior |
Edge Cases
One important edge case is a string containing only one distinct character, such as "aaaaaa". A buggy implementation might incorrectly shrink the window or mishandle frequency removal. In this implementation, the hash map always contains only one key, so the shrinking loop never executes, and the algorithm correctly returns the full string length.
Another important case is when the third distinct character appears immediately, such as "abc". A naive sliding window implementation might shrink only once and still leave three distinct characters inside the window. This solution uses a while loop rather than an if statement, guaranteeing the window continues shrinking until it becomes valid again.
A third edge case is when the optimal substring occurs in the middle of the string rather than at the beginning or end. For example, "aabbcc" has the optimal substring "aabb" or "bbcc", both of length 4. The algorithm checks every valid window during traversal, so it correctly identifies the best substring regardless of its position.
Another subtle case is repeated shrinking and expanding, such as "ababffzzeee". The window transitions through multiple valid and invalid states. Because character frequencies are tracked precisely in the hash map, the algorithm always knows exactly when a character fully leaves the window, preventing stale entries from corrupting the distinct character count.