LeetCode 3760 - Maximum Substrings With Distinct Start
The problem asks us to split a string into as many non-empty substrings as possible, with one important restriction: the first character of every substring must be different from the first character of every other substring.
Difficulty: 🟡 Medium
Topics: Hash Table, String
Solution
LeetCode 3760 - Maximum Substrings With Distinct Start
Problem Understanding
The problem asks us to split a string into as many non-empty substrings as possible, with one important restriction: the first character of every substring must be different from the first character of every other substring.
In other words, if we divide the string into several contiguous pieces, and look only at the first character of each piece, those starting characters must all be distinct.
The input is a string s consisting only of lowercase English letters. We must return the maximum number of substrings that can be formed while satisfying the distinct-start condition.
A useful way to think about the problem is that every substring begins at some position in the string. The starting character of that substring is simply the character at that position. Since no two substrings may start with the same character, each letter can be used as a substring start at most once.
The constraints are important:
1 <= s.length <= 100000- Only lowercase English letters appear
The string can be very large, so any solution that tries many different partitions will be too slow. However, the alphabet contains only 26 lowercase letters, which suggests there may be a very simple observation involving distinct characters.
Several edge cases are worth considering immediately:
- A string containing only one character, such as
"a", can only produce one substring. - A string where all characters are identical, such as
"aaaa", can only have one substring because every substring would start with'a'. - A string where every character is different, such as
"abcd", can be split between every pair of characters, producing four substrings. - A string containing repeated characters, such as
"abab", requires choosing substring boundaries carefully.
Approaches
Brute Force Approach
A brute force solution would consider every possible way to place cuts between characters.
For a string of length n, there are n - 1 possible cut positions. Each position may either contain a cut or not contain a cut, producing 2^(n-1) possible partitions.
For every partition, we could determine the starting character of each substring and check whether all starting characters are distinct. Among all valid partitions, we would keep the maximum number of substrings.
This approach is correct because it explicitly examines every possible partition and therefore cannot miss the optimal answer.
Unfortunately, it is far too slow. The number of partitions grows exponentially, making it completely infeasible even for moderate values of n, let alone 100000.
Key Insight
The only thing that matters about a substring is its starting character.
Suppose a character appears somewhere in the string. We can create a substring that starts at its first occurrence that we choose as a boundary. Since every substring must start with a unique letter, the total number of substrings can never exceed the number of distinct letters present in the string.
Now consider whether we can always achieve that upper bound.
Yes.
For every distinct character, we can use its first occurrence as the start of a substring. Whenever we encounter a character for the first time, we start a new substring there. Subsequent occurrences of the same character cannot contribute an additional substring start because that letter has already been used.
Therefore, the maximum number of substrings is exactly the number of distinct characters appearing in the string.
The problem reduces to counting distinct letters.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n · 2^(n-1)) | O(n) | Enumerates all possible partitions and validates each one |
| Optimal | O(n) | O(1) | Count distinct lowercase letters in the string |
Algorithm Walkthrough
- Create a set to store the distinct characters encountered in the string.
- Traverse the string from left to right.
- For each character, insert it into the set. A set automatically keeps only one copy of each character.
- After processing the entire string, the size of the set equals the number of distinct letters appearing in the string.
- Return the size of the set.
The reason this works is that every substring must begin with a unique letter. Therefore, the number of substrings cannot exceed the number of distinct letters. Conversely, every distinct letter can contribute exactly one substring start, so this bound is achievable.
Why it works
Let D be the number of distinct characters in the string.
Any valid partition creates substrings whose starting characters are all different. Since only D different letters exist in the string, no valid partition can contain more than D substrings.
On the other hand, whenever a letter appears for the first time, we can begin a new substring there. This produces exactly one substring start for each distinct letter, yielding D substrings.
Since D is both an upper bound and an achievable value, the maximum number of substrings equals the number of distinct characters in the string.
Python Solution
class Solution:
def maxDistinct(self, s: str) -> int:
return len(set(s))
The implementation directly follows the observation developed above.
The expression set(s) creates a set containing every distinct character appearing in the string. Duplicate occurrences are automatically discarded.
The length of this set is exactly the number of distinct letters. Since we proved that the maximum number of valid substrings equals the number of distinct letters, returning len(set(s)) gives the correct answer.
The solution performs a single pass over the string internally while building the set and requires only a constant amount of additional memory because the alphabet size is fixed at 26.
Go Solution
func maxDistinct(s string) int {
seen := make(map[byte]bool)
for i := 0; i < len(s); i++ {
seen[s[i]] = true
}
return len(seen)
}
The Go implementation uses a hash map to record which characters have appeared.
Because the input consists only of lowercase English letters, each character can be represented as a byte. As we iterate through the string, we mark each character as seen. The final size of the map equals the number of distinct characters.
Unlike Python's built-in set, Go uses a map to emulate set behavior. No special handling for empty strings is necessary because the constraints guarantee that the string length is at least one.
Worked Examples
Example 1
Input: s = "abab"
Distinct-character tracking:
| Position | Character | Set After Processing |
|---|---|---|
| 0 | a | {a} |
| 1 | b | {a, b} |
| 2 | a | {a, b} |
| 3 | b | {a, b} |
Final set size = 2
Answer = 2
One valid partition is:
"a""bab"
The starting characters are 'a' and 'b'.
Example 2
Input: s = "abcd"
| Position | Character | Set After Processing |
|---|---|---|
| 0 | a | {a} |
| 1 | b | {a, b} |
| 2 | c | {a, b, c} |
| 3 | d | {a, b, c, d} |
Final set size = 4
Answer = 4
A valid partition is:
"a""b""c""d"
Each substring starts with a unique letter.
Example 3
Input: s = "aaaa"
| Position | Character | Set After Processing |
|---|---|---|
| 0 | a | {a} |
| 1 | a | {a} |
| 2 | a | {a} |
| 3 | a | {a} |
Final set size = 1
Answer = 1
Only one distinct starting letter exists, so only one substring can be formed.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed once |
| Space | O(1) | At most 26 lowercase letters are stored |
The algorithm scans the string exactly once. Although a set is used, its size is bounded by the lowercase English alphabet, which contains only 26 characters. Therefore the auxiliary space is constant with respect to the input size.
Test Cases
sol = Solution()
assert sol.maxDistinct("abab") == 2 # provided example
assert sol.maxDistinct("abcd") == 4 # all unique characters
assert sol.maxDistinct("aaaa") == 1 # all same character
assert sol.maxDistinct("a") == 1 # minimum length string
assert sol.maxDistinct("ab") == 2 # two distinct letters
assert sol.maxDistinct("aa") == 1 # repeated letter only
assert sol.maxDistinct("abcabc") == 3 # repeated pattern
assert sol.maxDistinct("zzzzzz") == 1 # single distinct character
assert sol.maxDistinct("abcdefghijklmnopqrstuvwxyz") == 26 # all letters
assert sol.maxDistinct("aabbccdd") == 4 # multiple duplicates
assert sol.maxDistinct("abacadae") == 5 # mixed repetitions
assert sol.maxDistinct("qwertyqwerty") == 6 # repeated unique block
# large repeated input
assert sol.maxDistinct("a" * 100000) == 1
# large input containing all letters repeatedly
assert sol.maxDistinct("abcdefghijklmnopqrstuvwxyz" * 3000) == 26
Test Summary
| Test | Why |
|---|---|
"abab" |
Validates the first example |
"abcd" |
Every character is distinct |
"aaaa" |
Every character is identical |
"a" |
Minimum allowed length |
"ab" |
Small distinct-character case |
"aa" |
Small duplicate-character case |
"abcabc" |
Repeated pattern |
"zzzzzz" |
Single distinct character repeated many times |
"abcdefghijklmnopqrstuvwxyz" |
Maximum possible distinct count |
"aabbccdd" |
Groups of repeated letters |
"abacadae" |
Distinct count appears gradually |
"qwertyqwerty" |
Repeated block of unique letters |
"a" * 100000 |
Maximum length with one distinct letter |
"abcdefghijklmnopqrstuvwxyz" * 3000 |
Large input stressing performance |
Edge Cases
All Characters Are Identical
Consider a string such as "aaaaaa". A common mistake is to think that multiple substrings can be created because the string is long. However, every substring would start with 'a', violating the distinct-start requirement. The implementation correctly returns 1 because the set contains only one character.
Every Character Is Unique
For a string such as "abcdef", every character can serve as the start of its own substring. The implementation returns 6 because all six characters are distinct. This is the maximum possible answer for that string.
Large Input With Many Repetitions
A string such as "abcdefghijklmnopqrstuvwxyz" * 3000 contains 78,000 characters but only 26 distinct letters. A naive partitioning algorithm could waste enormous amounts of time exploring cuts. The set-based solution remains efficient because it only tracks which letters have appeared and never performs any partition simulation.
Minimum-Length Input
When the string length is exactly one, such as "x", there is only one possible substring, namely the entire string itself. The set contains one character, so the algorithm correctly returns 1.