LeetCode 3714 - Longest Balanced Substring II

The problem gives us a string s that contains only three possible characters: 'a', 'b', and 'c'. We must find the longest substring where every distinct character in that substring appears the same number of times.

LeetCode Problem 3714

Difficulty: 🟡 Medium
Topics: Hash Table, String, Prefix Sum

Solution

Problem Understanding

The problem gives us a string s that contains only three possible characters: 'a', 'b', and 'c'. We must find the longest substring where every distinct character in that substring appears the same number of times.

A key detail is that the condition only applies to distinct characters present in the substring, not necessarily all three characters.

For example:

  • "abba" is balanced because 'a' appears 2 times and 'b' appears 2 times.
  • "abc" is balanced because 'a', 'b', and 'c' each appear once.
  • "aaa" is also balanced because the only distinct character is 'a', and it appears equally often with itself.

The input is a single string s, and the output is an integer representing the maximum length among all balanced substrings.

The constraints are important:

  • 1 <= s.length <= 10^5
  • Only 'a', 'b', and 'c' appear.

A length of 10^5 immediately tells us that a naive O(n²) or O(n³) solution will likely be too slow. We need something closer to linear time, or at worst O(n log n).

The fact that there are only three possible characters is the major clue. Even though the string can be very long, the alphabet size is tiny. This strongly suggests that we may be able to encode balance conditions compactly using prefix information.

Some important edge cases include:

  • A string with only one repeated character, such as "aaaa". Every substring is balanced because only one distinct character exists.
  • Strings where no long balanced substring exists, such as "aaaaab", where only short windows may satisfy the condition.
  • Cases where only two characters participate in balance, such as "ababab".
  • Cases where all three characters are balanced, such as "aaabbbccc".

We must also remember that substrings are contiguous, so we cannot rearrange characters.

Approaches

Brute Force Approach

A straightforward approach is to examine every possible substring.

For each substring s[i:j], we count how many times 'a', 'b', and 'c' occur. Then we check whether all non-zero frequencies are equal.

For example, for substring "abbac":

Character Count
a 2
b 2
c 1

This substring is not balanced because the distinct characters do not all appear equally often.

To make this efficient, we could precompute prefix counts for 'a', 'b', and 'c', allowing substring frequency calculation in O(1) time.

However, there are still O(n²) substrings. Since n can be 10^5, this becomes far too slow.

Key Insight for an Optimal Solution

The important observation is that there are only three characters, so balance relationships can be represented through count differences.

Suppose we maintain prefix counts:

  • countA
  • countB
  • countC

A substring is balanced when all distinct characters appear equally often.

For the case where all three characters participate:

$$a = b = c$$

This means:

$$a - b = 0$$

and

$$a - c = 0$$

For a substring between indices i and j, these differences must remain unchanged between the two prefix states.

This suggests using a hash map of prefix states.

However, we must handle all valid character subsets:

  • {a}
  • {b}
  • {c}
  • {a,b}
  • {a,c}
  • {b,c}
  • {a,b,c}

For each subset, we normalize the counts of participating characters relative to one reference character.

For example:

  • subset {a,b} → store (a-b)
  • subset {a,b,c} → store (a-b, a-c)

If the same normalized state appears at two indices, then the substring between them is balanced for that subset.

Because there are only 7 subsets, we can check all of them efficiently at every position.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Enumerates all substrings and checks frequencies
Optimal O(n) O(n) Uses prefix states and hash maps over small character subsets

Algorithm Walkthrough

Optimal Algorithm

  1. Create prefix counts for 'a', 'b', and 'c'.

As we scan the string, we maintain cumulative counts:

  • countA
  • countB
  • countC

These represent frequencies up to the current index. 2. Enumerate all valid subsets of characters.

Since only three characters exist, there are exactly seven non-empty subsets:

  • {a}
  • {b}
  • {c}
  • {a,b}
  • {a,c}
  • {b,c}
  • {a,b,c}

We process all of them independently. 3. For each subset, define a normalized prefix state.

For a subset with multiple characters, choose one character as the base and record count differences.

Examples:

  • {a,b} → state = (countA - countB)
  • {a,c} → state = (countA - countC)
  • {a,b,c} → state = (countA - countB, countA - countC)

Equal states indicate balanced growth between prefix positions. 4. Track the earliest occurrence of each state.

We store:

state → earliest index

Initially, the empty prefix exists at index -1. 5. Ensure only the intended characters appear.

When considering subset {a,b}, we must ensure 'c' does not appear in the substring.

We include excluded character counts inside the hash key so they remain unchanged.

For example:

  • {a,b} key:

(countA - countB, countC)

Since countC must not change, the substring contains no 'c'. 6. Update the answer.

If the same state reappears at index i, then the substring between the earliest occurrence and i is balanced.

Update:

answer = max(answer, i - earliestIndex)
  1. Continue until the entire string is processed.

Why it works

The algorithm works because prefix subtraction converts substring frequency conditions into equality of prefix states.

For a balanced substring, all participating characters must increase by exactly the same amount. This means their pairwise differences remain unchanged. Excluded characters must not change at all, which is enforced by storing their prefix counts in the state.

Therefore, whenever the same normalized state appears twice, the substring between those positions is guaranteed to be balanced.

Python Solution

class Solution:
    def longestBalanced(self, s: str) -> int:
        subset_maps = [
            ({0}, {}),
            ({1}, {}),
            ({2}, {}),
            ({0, 1}, {}),
            ({0, 2}, {}),
            ({1, 2}, {}),
            ({0, 1, 2}, {})
        ]

        for _, seen in subset_maps:
            seen[(0,)] = -1

        counts = [0, 0, 0]
        best = 0

        for index, ch in enumerate(s):
            counts[ord(ch) - ord('a')] += 1

            for subset, seen in subset_maps:
                subset_list = sorted(subset)

                base = subset_list[0]
                key = []

                # Differences among participating chars
                for char_idx in subset_list[1:]:
                    key.append(counts[base] - counts[char_idx])

                # Counts of excluded chars must stay fixed
                for char_idx in range(3):
                    if char_idx not in subset:
                        key.append(counts[char_idx])

                key = tuple(key)

                if key in seen:
                    best = max(best, index - seen[key])
                else:
                    seen[key] = index

        return best

Implementation Explanation

The solution maintains prefix counts for the three characters using the counts array.

Each subset of characters gets its own hash map. The hash map stores normalized prefix states and the earliest index where each state appeared.

At every character:

  1. We update the relevant prefix count.
  2. For each subset, we construct a state key.
  3. The key stores:
  • Differences between participating character counts.
  • Exact prefix counts for excluded characters.
  1. If this state has appeared before, the substring between the two indices is balanced.
  2. Otherwise, we store the current index as the earliest occurrence.

Because the number of subsets is constant (7), each iteration performs only constant work.

Go Solution

func longestBalanced(s string) int {
	type subsetInfo struct {
		subset map[int]bool
		seen   map[string]int
	}

	subsets := []map[int]bool{
		{0: true},
		{1: true},
		{2: true},
		{0: true, 1: true},
		{0: true, 2: true},
		{1: true, 2: true},
		{0: true, 1: true, 2: true},
	}

	infos := make([]subsetInfo, len(subsets))

	for i, subset := range subsets {
		infos[i] = subsetInfo{
			subset: subset,
			seen:   map[string]int{"": -1},
		}
	}

	counts := [3]int{}
	best := 0

	for i := 0; i < len(s); i++ {
		counts[s[i]-'a']++

		for _, info := range infos {
			key := ""

			var members []int
			for j := 0; j < 3; j++ {
				if info.subset[j] {
					members = append(members, j)
				}
			}

			base := members[0]

			for j := 1; j < len(members); j++ {
				diff := counts[base] - counts[members[j]]
				key += "|" + string(rune(diff+100000))
			}

			for j := 0; j < 3; j++ {
				if !info.subset[j] {
					key += "|" + string(rune(counts[j]+100000))
				}
			}

			if prev, exists := info.seen[key]; exists {
				length := i - prev
				if length > best {
					best = length
				}
			} else {
				info.seen[key] = i
			}
		}
	}

	return best
}

Go-Specific Implementation Differences

Go does not support tuples as hash map keys in the same flexible way Python does, so we serialize the state into a string key.

The prefix counts are stored in a fixed-size array [3]int, which is efficient and avoids dynamic allocations.

Unlike Python dictionaries, Go maps return both the stored value and a boolean indicating whether the key exists, which is used to distinguish between unseen states and valid stored indices.

Integer overflow is not a concern because the maximum count is 10^5, which easily fits inside Go's int.

Worked Examples

Example 1

Input:

s = "abbac"

We track prefix counts:

Index Char a b c Best
0 a 1 0 0 1
1 b 1 1 0 2
2 b 1 2 0 2
3 a 2 2 0 4
4 c 2 2 1 4

At index 3, subset {a,b} produces a repeated state, identifying substring "abba" of length 4.

Final answer:

4

Example 2

Input:

s = "aabcc"
Index Char a b c Best
0 a 1 0 0 1
1 a 2 0 0 2
2 b 2 1 0 2
3 c 2 1 1 3
4 c 2 1 2 3

At index 3, subset {a,b,c} detects "abc".

Final answer:

3

Example 3

Input:

s = "aba"
Index Char a b c Best
0 a 1 0 0 1
1 b 1 1 0 2
2 a 2 1 0 2

Balanced substrings include "ab" and "ba".

Final answer:

2

Complexity Analysis

Measure Complexity Explanation
Time O(n) Constant work for each of the 7 subsets at every index
Space O(n) Hash maps may store up to O(n) prefix states

Although multiple hash maps are used, there are only seven subsets, which is constant. Therefore, the runtime scales linearly with the input size.

Similarly, each subset map may grow proportionally to the number of indices, giving linear memory usage.

Test Cases

solution = Solution()

assert solution.longestBalanced("abbac") == 4  # example 1
assert solution.longestBalanced("aabcc") == 3  # example 2
assert solution.longestBalanced("aba") == 2  # example 3

assert solution.longestBalanced("a") == 1  # single character
assert solution.longestBalanced("aaaa") == 4  # single distinct character
assert solution.longestBalanced("abc") == 3  # all balanced once
assert solution.longestBalanced("aaabbbccc") == 9  # full string balanced
assert solution.longestBalanced("ababab") == 6  # two-character balance
assert solution.longestBalanced("abcc") == 3  # substring inside string
assert solution.longestBalanced("aaaaab") == 2  # only short balance
assert solution.longestBalanced("abcabc") == 6  # repeated balanced structure
assert solution.longestBalanced("aabbccabc") == 9  # full string valid
assert solution.longestBalanced("cccccc") == 6  # single repeated char
assert solution.longestBalanced("bac") == 3  # permutation of abc

Test Summary

Test Why
"abbac" Validates example 1
"aabcc" Validates example 2
"aba" Validates example 3
"a" Minimum input size
"aaaa" Single distinct character
"abc" All three balanced
"aaabbbccc" Entire string balanced
"ababab" Two-character balance
"abcc" Best substring not full string
"aaaaab" Small valid substring only
"abcabc" Repeated balanced structure
"aabbccabc" Large balanced substring
"cccccc" One repeated character
"bac" Different ordering

Edge Cases

Single Distinct Character

A substring containing only one unique character is always balanced because all distinct characters appear equally often.

For example:

"aaaa"

A buggy implementation might incorrectly require multiple characters to exist. This solution handles it correctly because single-character subsets ({a}, {b}, {c}) are explicitly considered.

Excluded Characters Appearing Accidentally

Consider:

"abca"

The substring "abc" is balanced for {a,b,c}, but "abca" is not balanced for {a,b} because 'c' appears.

A naive difference-only approach may accidentally accept such cases. This implementation prevents that bug by storing excluded character counts inside the state key, ensuring excluded characters do not change.

Entire String is Balanced

Consider:

"aaabbbccc"

The optimal answer is the full string length.

A common mistake is failing to initialize prefix states correctly, which prevents substrings beginning at index 0 from being detected. This implementation initializes every subset map with the empty prefix at index -1, allowing full-prefix matches to work correctly.