LeetCode 3714 - Longest Balanced Substring II
The problem gives us a string s that contains only three possible characters: 'a', 'b', and 'c'. We must find the longest substring where every distinct character in that substring appears the same number of times.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Prefix Sum
Solution
Problem Understanding
The problem gives us a string s that contains only three possible characters: 'a', 'b', and 'c'. We must find the longest substring where every distinct character in that substring appears the same number of times.
A key detail is that the condition only applies to distinct characters present in the substring, not necessarily all three characters.
For example:
"abba"is balanced because'a'appears 2 times and'b'appears 2 times."abc"is balanced because'a','b', and'c'each appear once."aaa"is also balanced because the only distinct character is'a', and it appears equally often with itself.
The input is a single string s, and the output is an integer representing the maximum length among all balanced substrings.
The constraints are important:
1 <= s.length <= 10^5- Only
'a','b', and'c'appear.
A length of 10^5 immediately tells us that a naive O(n²) or O(n³) solution will likely be too slow. We need something closer to linear time, or at worst O(n log n).
The fact that there are only three possible characters is the major clue. Even though the string can be very long, the alphabet size is tiny. This strongly suggests that we may be able to encode balance conditions compactly using prefix information.
Some important edge cases include:
- A string with only one repeated character, such as
"aaaa". Every substring is balanced because only one distinct character exists. - Strings where no long balanced substring exists, such as
"aaaaab", where only short windows may satisfy the condition. - Cases where only two characters participate in balance, such as
"ababab". - Cases where all three characters are balanced, such as
"aaabbbccc".
We must also remember that substrings are contiguous, so we cannot rearrange characters.
Approaches
Brute Force Approach
A straightforward approach is to examine every possible substring.
For each substring s[i:j], we count how many times 'a', 'b', and 'c' occur. Then we check whether all non-zero frequencies are equal.
For example, for substring "abbac":
| Character | Count |
|---|---|
| a | 2 |
| b | 2 |
| c | 1 |
This substring is not balanced because the distinct characters do not all appear equally often.
To make this efficient, we could precompute prefix counts for 'a', 'b', and 'c', allowing substring frequency calculation in O(1) time.
However, there are still O(n²) substrings. Since n can be 10^5, this becomes far too slow.
Key Insight for an Optimal Solution
The important observation is that there are only three characters, so balance relationships can be represented through count differences.
Suppose we maintain prefix counts:
countAcountBcountC
A substring is balanced when all distinct characters appear equally often.
For the case where all three characters participate:
$$a = b = c$$
This means:
$$a - b = 0$$
and
$$a - c = 0$$
For a substring between indices i and j, these differences must remain unchanged between the two prefix states.
This suggests using a hash map of prefix states.
However, we must handle all valid character subsets:
{a}{b}{c}{a,b}{a,c}{b,c}{a,b,c}
For each subset, we normalize the counts of participating characters relative to one reference character.
For example:
- subset
{a,b}→ store(a-b) - subset
{a,b,c}→ store(a-b, a-c)
If the same normalized state appears at two indices, then the substring between them is balanced for that subset.
Because there are only 7 subsets, we can check all of them efficiently at every position.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) |
O(n) |
Enumerates all substrings and checks frequencies |
| Optimal | O(n) |
O(n) |
Uses prefix states and hash maps over small character subsets |
Algorithm Walkthrough
Optimal Algorithm
- Create prefix counts for
'a','b', and'c'.
As we scan the string, we maintain cumulative counts:
countAcountBcountC
These represent frequencies up to the current index. 2. Enumerate all valid subsets of characters.
Since only three characters exist, there are exactly seven non-empty subsets:
{a}{b}{c}{a,b}{a,c}{b,c}{a,b,c}
We process all of them independently. 3. For each subset, define a normalized prefix state.
For a subset with multiple characters, choose one character as the base and record count differences.
Examples:
{a,b}→ state =(countA - countB){a,c}→ state =(countA - countC){a,b,c}→ state =(countA - countB, countA - countC)
Equal states indicate balanced growth between prefix positions. 4. Track the earliest occurrence of each state.
We store:
state → earliest index
Initially, the empty prefix exists at index -1.
5. Ensure only the intended characters appear.
When considering subset {a,b}, we must ensure 'c' does not appear in the substring.
We include excluded character counts inside the hash key so they remain unchanged.
For example:
{a,b}key:
(countA - countB, countC)
Since countC must not change, the substring contains no 'c'.
6. Update the answer.
If the same state reappears at index i, then the substring between the earliest occurrence and i is balanced.
Update:
answer = max(answer, i - earliestIndex)
- Continue until the entire string is processed.
Why it works
The algorithm works because prefix subtraction converts substring frequency conditions into equality of prefix states.
For a balanced substring, all participating characters must increase by exactly the same amount. This means their pairwise differences remain unchanged. Excluded characters must not change at all, which is enforced by storing their prefix counts in the state.
Therefore, whenever the same normalized state appears twice, the substring between those positions is guaranteed to be balanced.
Python Solution
class Solution:
def longestBalanced(self, s: str) -> int:
subset_maps = [
({0}, {}),
({1}, {}),
({2}, {}),
({0, 1}, {}),
({0, 2}, {}),
({1, 2}, {}),
({0, 1, 2}, {})
]
for _, seen in subset_maps:
seen[(0,)] = -1
counts = [0, 0, 0]
best = 0
for index, ch in enumerate(s):
counts[ord(ch) - ord('a')] += 1
for subset, seen in subset_maps:
subset_list = sorted(subset)
base = subset_list[0]
key = []
# Differences among participating chars
for char_idx in subset_list[1:]:
key.append(counts[base] - counts[char_idx])
# Counts of excluded chars must stay fixed
for char_idx in range(3):
if char_idx not in subset:
key.append(counts[char_idx])
key = tuple(key)
if key in seen:
best = max(best, index - seen[key])
else:
seen[key] = index
return best
Implementation Explanation
The solution maintains prefix counts for the three characters using the counts array.
Each subset of characters gets its own hash map. The hash map stores normalized prefix states and the earliest index where each state appeared.
At every character:
- We update the relevant prefix count.
- For each subset, we construct a state key.
- The key stores:
- Differences between participating character counts.
- Exact prefix counts for excluded characters.
- If this state has appeared before, the substring between the two indices is balanced.
- Otherwise, we store the current index as the earliest occurrence.
Because the number of subsets is constant (7), each iteration performs only constant work.
Go Solution
func longestBalanced(s string) int {
type subsetInfo struct {
subset map[int]bool
seen map[string]int
}
subsets := []map[int]bool{
{0: true},
{1: true},
{2: true},
{0: true, 1: true},
{0: true, 2: true},
{1: true, 2: true},
{0: true, 1: true, 2: true},
}
infos := make([]subsetInfo, len(subsets))
for i, subset := range subsets {
infos[i] = subsetInfo{
subset: subset,
seen: map[string]int{"": -1},
}
}
counts := [3]int{}
best := 0
for i := 0; i < len(s); i++ {
counts[s[i]-'a']++
for _, info := range infos {
key := ""
var members []int
for j := 0; j < 3; j++ {
if info.subset[j] {
members = append(members, j)
}
}
base := members[0]
for j := 1; j < len(members); j++ {
diff := counts[base] - counts[members[j]]
key += "|" + string(rune(diff+100000))
}
for j := 0; j < 3; j++ {
if !info.subset[j] {
key += "|" + string(rune(counts[j]+100000))
}
}
if prev, exists := info.seen[key]; exists {
length := i - prev
if length > best {
best = length
}
} else {
info.seen[key] = i
}
}
}
return best
}
Go-Specific Implementation Differences
Go does not support tuples as hash map keys in the same flexible way Python does, so we serialize the state into a string key.
The prefix counts are stored in a fixed-size array [3]int, which is efficient and avoids dynamic allocations.
Unlike Python dictionaries, Go maps return both the stored value and a boolean indicating whether the key exists, which is used to distinguish between unseen states and valid stored indices.
Integer overflow is not a concern because the maximum count is 10^5, which easily fits inside Go's int.
Worked Examples
Example 1
Input:
s = "abbac"
We track prefix counts:
| Index | Char | a | b | c | Best |
|---|---|---|---|---|---|
| 0 | a | 1 | 0 | 0 | 1 |
| 1 | b | 1 | 1 | 0 | 2 |
| 2 | b | 1 | 2 | 0 | 2 |
| 3 | a | 2 | 2 | 0 | 4 |
| 4 | c | 2 | 2 | 1 | 4 |
At index 3, subset {a,b} produces a repeated state, identifying substring "abba" of length 4.
Final answer:
4
Example 2
Input:
s = "aabcc"
| Index | Char | a | b | c | Best |
|---|---|---|---|---|---|
| 0 | a | 1 | 0 | 0 | 1 |
| 1 | a | 2 | 0 | 0 | 2 |
| 2 | b | 2 | 1 | 0 | 2 |
| 3 | c | 2 | 1 | 1 | 3 |
| 4 | c | 2 | 1 | 2 | 3 |
At index 3, subset {a,b,c} detects "abc".
Final answer:
3
Example 3
Input:
s = "aba"
| Index | Char | a | b | c | Best |
|---|---|---|---|---|---|
| 0 | a | 1 | 0 | 0 | 1 |
| 1 | b | 1 | 1 | 0 | 2 |
| 2 | a | 2 | 1 | 0 | 2 |
Balanced substrings include "ab" and "ba".
Final answer:
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) |
Constant work for each of the 7 subsets at every index |
| Space | O(n) |
Hash maps may store up to O(n) prefix states |
Although multiple hash maps are used, there are only seven subsets, which is constant. Therefore, the runtime scales linearly with the input size.
Similarly, each subset map may grow proportionally to the number of indices, giving linear memory usage.
Test Cases
solution = Solution()
assert solution.longestBalanced("abbac") == 4 # example 1
assert solution.longestBalanced("aabcc") == 3 # example 2
assert solution.longestBalanced("aba") == 2 # example 3
assert solution.longestBalanced("a") == 1 # single character
assert solution.longestBalanced("aaaa") == 4 # single distinct character
assert solution.longestBalanced("abc") == 3 # all balanced once
assert solution.longestBalanced("aaabbbccc") == 9 # full string balanced
assert solution.longestBalanced("ababab") == 6 # two-character balance
assert solution.longestBalanced("abcc") == 3 # substring inside string
assert solution.longestBalanced("aaaaab") == 2 # only short balance
assert solution.longestBalanced("abcabc") == 6 # repeated balanced structure
assert solution.longestBalanced("aabbccabc") == 9 # full string valid
assert solution.longestBalanced("cccccc") == 6 # single repeated char
assert solution.longestBalanced("bac") == 3 # permutation of abc
Test Summary
| Test | Why |
|---|---|
"abbac" |
Validates example 1 |
"aabcc" |
Validates example 2 |
"aba" |
Validates example 3 |
"a" |
Minimum input size |
"aaaa" |
Single distinct character |
"abc" |
All three balanced |
"aaabbbccc" |
Entire string balanced |
"ababab" |
Two-character balance |
"abcc" |
Best substring not full string |
"aaaaab" |
Small valid substring only |
"abcabc" |
Repeated balanced structure |
"aabbccabc" |
Large balanced substring |
"cccccc" |
One repeated character |
"bac" |
Different ordering |
Edge Cases
Single Distinct Character
A substring containing only one unique character is always balanced because all distinct characters appear equally often.
For example:
"aaaa"
A buggy implementation might incorrectly require multiple characters to exist. This solution handles it correctly because single-character subsets ({a}, {b}, {c}) are explicitly considered.
Excluded Characters Appearing Accidentally
Consider:
"abca"
The substring "abc" is balanced for {a,b,c}, but "abca" is not balanced for {a,b} because 'c' appears.
A naive difference-only approach may accidentally accept such cases. This implementation prevents that bug by storing excluded character counts inside the state key, ensuring excluded characters do not change.
Entire String is Balanced
Consider:
"aaabbbccc"
The optimal answer is the full string length.
A common mistake is failing to initialize prefix states correctly, which prevents substrings beginning at index 0 from being detected. This implementation initializes every subset map with the empty prefix at index -1, allowing full-prefix matches to work correctly.