LeetCode 1525 - Number of Good Ways to Split a String

The problem asks us to count how many ways we can split a string into two non-empty parts such that both parts contain the same number of distinct characters. A split occurs between two adjacent characters.

LeetCode Problem 1525

Difficulty: 🟔 Medium
Topics: Hash Table, String, Dynamic Programming, Bit Manipulation

Solution

Problem Understanding

The problem asks us to count how many ways we can split a string into two non-empty parts such that both parts contain the same number of distinct characters.

A split occurs between two adjacent characters. For a string of length n, there are exactly n - 1 possible split positions because both sides must remain non-empty.

For each split:

  • The left substring consists of characters from the beginning up to the split point.
  • The right substring consists of the remaining characters.
  • A split is considered good if the number of unique letters on the left side equals the number of unique letters on the right side.

For example, consider the string "aacaba":

  • Splitting after "aac" gives:

  • Left: "aac" → distinct letters are {a, c} → count = 2

  • Right: "aba" → distinct letters are {a, b} → count = 2

  • Since both sides have the same number of distinct characters, this is a good split.

The constraints are important:

  • 1 <= s.length <= 10^5
  • The string contains only lowercase English letters.

A length of 10^5 immediately tells us that quadratic solutions will likely time out. Any approach that repeatedly scans substrings or recomputes distinct character counts from scratch is too slow. We need a linear or near-linear solution.

There are several edge cases worth considering:

  • Very short strings such as "a" cannot produce any valid split because both sides must be non-empty.
  • Strings where all characters are identical, such as "aaaa", produce many valid splits because both sides always contain exactly one distinct character.
  • Strings where every character is unique, such as "abcd", require careful handling because the distinct counts change rapidly.
  • Splits near the beginning or end must still preserve non-empty substrings.

The problem guarantees lowercase English letters only, which means we can use compact frequency arrays of size 26 instead of larger hash structures if desired.

Approaches

Brute Force Approach

The most direct solution is to try every possible split position.

For each split:

  1. Build the left substring.
  2. Build the right substring.
  3. Count the number of distinct characters in each substring.
  4. Compare the counts.
  5. Increment the answer if they are equal.

To count distinct characters, we could use a set.

For example:

left_distinct = len(set(s[:i]))
right_distinct = len(set(s[i:]))

This works correctly because sets automatically remove duplicates, giving us the number of unique characters.

However, this approach is inefficient. There are O(n) split positions, and each split may require scanning up to O(n) characters to build sets. The total complexity becomes O(n^2).

With n = 10^5, a quadratic solution is far too slow.

Key Insight

The important observation is that the distinct character count changes incrementally as we move the split point from left to right.

Instead of recomputing distinct counts from scratch for every split, we can maintain:

  • The distinct character count on the left side
  • The distinct character count on the right side

We process the string once while dynamically updating these counts.

The strategy is:

  • Initially, all characters belong to the right side.
  • As we move left to right, characters are transferred from the right side to the left side.
  • We track character frequencies on both sides.
  • Whenever the distinct counts become equal, we have found a good split.

This transforms the problem into a linear-time solution.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Recomputes distinct counts for every split
Optimal O(n) O(1) Maintains rolling distinct counts using frequency arrays

Algorithm Walkthrough

  1. Create a frequency array for the right side.

Initially, the entire string belongs to the right substring. We count the frequency of every character in the string.

We also compute the number of distinct characters currently present on the right side. 2. Create an empty frequency array for the left side.

At the beginning, the left substring contains no characters, so all frequencies are zero and the distinct count is zero. 3. Iterate through the string from left to right, stopping before the last character.

We stop before the final character because both substrings must remain non-empty. 4. Move the current character from right to left.

For each character:

  • Increment its count in the left frequency array.
  • Decrement its count in the right frequency array.
  1. Update the distinct counts carefully.

For the left side:

  • If the character count becomes 1, this character has appeared for the first time on the left.
  • Increase the left distinct count.

For the right side:

  • If the character count becomes 0, the character no longer exists on the right.
  • Decrease the right distinct count.
  1. Compare distinct counts.

If the number of distinct characters on both sides is equal, increment the answer. 7. Continue until all valid split positions are processed.

Why it works

At every iteration, the algorithm maintains an invariant:

  • The left frequency array exactly represents the characters in the left substring.
  • The right frequency array exactly represents the characters in the right substring.
  • The distinct counters always equal the number of unique characters in their respective substrings.

Because each split position is processed exactly once and the data structures are updated correctly, every good split is counted exactly once.

Python Solution

class Solution:
    def numSplits(self, s: str) -> int:
        right_freq = [0] * 26
        
        for ch in s:
            right_freq[ord(ch) - ord('a')] += 1
        
        right_distinct = 0
        
        for count in right_freq:
            if count > 0:
                right_distinct += 1
        
        left_freq = [0] * 26
        left_distinct = 0
        good_splits = 0
        
        for i in range(len(s) - 1):
            index = ord(s[i]) - ord('a')
            
            left_freq[index] += 1
            if left_freq[index] == 1:
                left_distinct += 1
            
            right_freq[index] -= 1
            if right_freq[index] == 0:
                right_distinct -= 1
            
            if left_distinct == right_distinct:
                good_splits += 1
        
        return good_splits

The implementation begins by counting the frequency of every character in the entire string. Since the whole string initially belongs to the right substring, these counts populate the right_freq array.

Next, we compute right_distinct, which tracks how many unique characters currently exist on the right side.

The left side starts empty, so left_freq contains all zeros and left_distinct starts at zero.

The loop processes each possible split position. At every step, one character moves from right to left. The algorithm updates both frequency arrays and adjusts the distinct counters only when necessary.

The key optimization is that distinct counts are updated incrementally instead of recomputed from scratch.

Finally, whenever the left and right distinct counts match, the split is counted as good.

Go Solution

func numSplits(s string) int {
    rightFreq := make([]int, 26)

    for _, ch := range s {
        rightFreq[ch-'a']++
    }

    rightDistinct := 0

    for _, count := range rightFreq {
        if count > 0 {
            rightDistinct++
        }
    }

    leftFreq := make([]int, 26)
    leftDistinct := 0
    goodSplits := 0

    for i := 0; i < len(s)-1; i++ {
        index := s[i] - 'a'

        leftFreq[index]++
        if leftFreq[index] == 1 {
            leftDistinct++
        }

        rightFreq[index]--
        if rightFreq[index] == 0 {
            rightDistinct--
        }

        if leftDistinct == rightDistinct {
            goodSplits++
        }
    }

    return goodSplits
}

The Go implementation closely mirrors the Python solution.

Instead of Python lists, Go uses integer slices of length 26 to store character frequencies.

Character indexing is performed using byte arithmetic:

index := s[i] - 'a'

Since the input contains only lowercase English letters, byte indexing is safe and efficient.

Go does not require special handling for integer overflow here because all counts remain within the range of the string length.

Worked Examples

Example 1

Input:

s = "aacaba"

Initial right frequencies:

Character Count
a 4
b 1
c 1

Initial state:

  • left_distinct = 0
  • right_distinct = 3

Now process each split.

Split Position Left Right Left Distinct Right Distinct Good Split
1 "a" "acaba" 1 3 No
2 "aa" "caba" 1 3 No
3 "aac" "aba" 2 2 Yes
4 "aaca" "ba" 2 2 Yes
5 "aacab" "a" 3 1 No

Final answer:

2

Example 2

Input:

s = "abcd"

Initial state:

  • right_distinct = 4
  • left_distinct = 0

Process splits:

Split Position Left Right Left Distinct Right Distinct Good Split
1 "a" "bcd" 1 3 No
2 "ab" "cd" 2 2 Yes
3 "abc" "d" 3 1 No

Final answer:

1

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed once
Space O(1) Frequency arrays have fixed size 26

The algorithm performs a single pass over the string after initializing the frequency counts. Every operation inside the loop is constant time, so the total running time is linear.

The space complexity is constant because the frequency arrays always contain exactly 26 entries, regardless of input size.

Test Cases

solution = Solution()

assert solution.numSplits("aacaba") == 2  # Provided example with multiple good splits
assert solution.numSplits("abcd") == 1  # All distinct characters
assert solution.numSplits("aaaa") == 3  # Every split is valid
assert solution.numSplits("a") == 0  # No possible split
assert solution.numSplits("ab") == 1  # Smallest valid good split
assert solution.numSplits("aaabbb") == 1  # Only middle split works
assert solution.numSplits("abcabc") == 1  # Repeated pattern
assert solution.numSplits("zzzzzz") == 5  # Single repeated character
assert solution.numSplits("abccba") == 1  # Symmetric structure
assert solution.numSplits("abcdefghijklmnopqrstuvwxyz") == 1  # Maximum distinct letters
Test Why
"aacaba" Verifies standard example with multiple valid splits
"abcd" Tests all unique characters
"aaaa" Tests repeated single character
"a" Tests minimum string length
"ab" Tests smallest possible good split
"aaabbb" Tests transition between character groups
"abcabc" Tests repeated patterns
"zzzzzz" Tests many valid splits
"abccba" Tests symmetric distribution
"abcdefghijklmnopqrstuvwxyz" Tests maximum distinct character variety

Edge Cases

One important edge case is a string of length one, such as "a". Since a valid split requires two non-empty substrings, there are no legal split positions. A careless implementation might accidentally process an invalid split or produce an index error. The implementation avoids this by iterating only up to len(s) - 1.

Another important case is when all characters are identical, such as "aaaaaa". In this scenario, both sides always contain exactly one distinct character regardless of where the split occurs. Some implementations incorrectly overcount or undercount because they mishandle frequency updates when counts remain positive. This solution correctly updates distinct counts only when frequencies cross the thresholds 0 -> 1 or 1 -> 0.

A third tricky case occurs when all characters are distinct, such as "abcdef". The number of distinct characters on the left increases at every step, while the number on the right decreases at every step. Off-by-one mistakes are common here because the split position must leave both sides non-empty. The implementation handles this correctly by stopping before the final character.

Another subtle edge case involves repeated transitions, such as "abcabc". Characters repeatedly appear on both sides of the split. A naive implementation might incorrectly decrement distinct counts too early. By using exact frequency tracking, the algorithm only reduces the distinct count when a character completely disappears from one side.