Difficulty: 🟡 Medium
Topics: Hash Table, String

Solution

LeetCode 3760 - Maximum Substrings With Distinct Start

Problem Understanding

The problem asks us to split a string into as many non-empty substrings as possible, with one important restriction: the first character of every substring must be different from the first character of every other substring.

In other words, if we divide the string into several contiguous pieces, and look only at the first character of each piece, those starting characters must all be distinct.

The input is a string s consisting only of lowercase English letters. We must return the maximum number of substrings that can be formed while satisfying the distinct-start condition.

A useful way to think about the problem is that every substring begins at some position in the string. The starting character of that substring is simply the character at that position. Since no two substrings may start with the same character, each letter can be used as a substring start at most once.

The constraints are important:

1 <= s.length <= 100000
Only lowercase English letters appear

The string can be very large, so any solution that tries many different partitions will be too slow. However, the alphabet contains only 26 lowercase letters, which suggests there may be a very simple observation involving distinct characters.

Several edge cases are worth considering immediately:

A string containing only one character, such as "a", can only produce one substring.
A string where all characters are identical, such as "aaaa", can only have one substring because every substring would start with 'a'.
A string where every character is different, such as "abcd", can be split between every pair of characters, producing four substrings.
A string containing repeated characters, such as "abab", requires choosing substring boundaries carefully.

Approaches

Brute Force Approach

A brute force solution would consider every possible way to place cuts between characters.

For a string of length n, there are n - 1 possible cut positions. Each position may either contain a cut or not contain a cut, producing 2^(n-1) possible partitions.

For every partition, we could determine the starting character of each substring and check whether all starting characters are distinct. Among all valid partitions, we would keep the maximum number of substrings.

This approach is correct because it explicitly examines every possible partition and therefore cannot miss the optimal answer.

Unfortunately, it is far too slow. The number of partitions grows exponentially, making it completely infeasible even for moderate values of n, let alone 100000.

Key Insight

The only thing that matters about a substring is its starting character.

Suppose a character appears somewhere in the string. We can create a substring that starts at its first occurrence that we choose as a boundary. Since every substring must start with a unique letter, the total number of substrings can never exceed the number of distinct letters present in the string.

Now consider whether we can always achieve that upper bound.

Yes.

For every distinct character, we can use its first occurrence as the start of a substring. Whenever we encounter a character for the first time, we start a new substring there. Subsequent occurrences of the same character cannot contribute an additional substring start because that letter has already been used.

Therefore, the maximum number of substrings is exactly the number of distinct characters appearing in the string.

The problem reduces to counting distinct letters.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n · 2^(n-1))	O(n)	Enumerates all possible partitions and validates each one
Optimal	O(n)	O(1)	Count distinct lowercase letters in the string

Algorithm Walkthrough

Create a set to store the distinct characters encountered in the string.
Traverse the string from left to right.
For each character, insert it into the set. A set automatically keeps only one copy of each character.
After processing the entire string, the size of the set equals the number of distinct letters appearing in the string.
Return the size of the set.

The reason this works is that every substring must begin with a unique letter. Therefore, the number of substrings cannot exceed the number of distinct letters. Conversely, every distinct letter can contribute exactly one substring start, so this bound is achievable.

Why it works

Let D be the number of distinct characters in the string.

Any valid partition creates substrings whose starting characters are all different. Since only D different letters exist in the string, no valid partition can contain more than D substrings.

On the other hand, whenever a letter appears for the first time, we can begin a new substring there. This produces exactly one substring start for each distinct letter, yielding D substrings.

Since D is both an upper bound and an achievable value, the maximum number of substrings equals the number of distinct characters in the string.

Python Solution

class Solution:
    def maxDistinct(self, s: str) -> int:
        return len(set(s))

The implementation directly follows the observation developed above.

The expression set(s) creates a set containing every distinct character appearing in the string. Duplicate occurrences are automatically discarded.

The length of this set is exactly the number of distinct letters. Since we proved that the maximum number of valid substrings equals the number of distinct letters, returning len(set(s)) gives the correct answer.

The solution performs a single pass over the string internally while building the set and requires only a constant amount of additional memory because the alphabet size is fixed at 26.

Go Solution

func maxDistinct(s string) int {
	seen := make(map[byte]bool)

	for i := 0; i < len(s); i++ {
		seen[s[i]] = true
	}

	return len(seen)
}

The Go implementation uses a hash map to record which characters have appeared.

Because the input consists only of lowercase English letters, each character can be represented as a byte. As we iterate through the string, we mark each character as seen. The final size of the map equals the number of distinct characters.

Unlike Python's built-in set, Go uses a map to emulate set behavior. No special handling for empty strings is necessary because the constraints guarantee that the string length is at least one.

Worked Examples

Example 1

Input: s = "abab"

Distinct-character tracking:

Position	Character	Set After Processing
0	a	{a}
1	b	{a, b}
2	a	{a, b}
3	b	{a, b}

Final set size = 2

Answer = 2

One valid partition is:

"a"
"bab"

The starting characters are 'a' and 'b'.

Example 2

Input: s = "abcd"

Position	Character	Set After Processing
0	a	{a}
1	b	{a, b}
2	c	{a, b, c}
3	d	{a, b, c, d}

Final set size = 4

Answer = 4

A valid partition is:

"a"
"b"
"c"
"d"

Each substring starts with a unique letter.

Example 3

Input: s = "aaaa"

Position	Character	Set After Processing
0	a	{a}
1	a	{a}
2	a	{a}
3	a	{a}

Final set size = 1

Answer = 1

Only one distinct starting letter exists, so only one substring can be formed.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each character is processed once
Space	O(1)	At most 26 lowercase letters are stored

The algorithm scans the string exactly once. Although a set is used, its size is bounded by the lowercase English alphabet, which contains only 26 characters. Therefore the auxiliary space is constant with respect to the input size.

Test Cases

sol = Solution()

assert sol.maxDistinct("abab") == 2      # provided example
assert sol.maxDistinct("abcd") == 4      # all unique characters
assert sol.maxDistinct("aaaa") == 1      # all same character

assert sol.maxDistinct("a") == 1         # minimum length string
assert sol.maxDistinct("ab") == 2        # two distinct letters
assert sol.maxDistinct("aa") == 1        # repeated letter only

assert sol.maxDistinct("abcabc") == 3    # repeated pattern
assert sol.maxDistinct("zzzzzz") == 1    # single distinct character
assert sol.maxDistinct("abcdefghijklmnopqrstuvwxyz") == 26  # all letters

assert sol.maxDistinct("aabbccdd") == 4  # multiple duplicates
assert sol.maxDistinct("abacadae") == 5  # mixed repetitions
assert sol.maxDistinct("qwertyqwerty") == 6  # repeated unique block

# large repeated input
assert sol.maxDistinct("a" * 100000) == 1

# large input containing all letters repeatedly
assert sol.maxDistinct("abcdefghijklmnopqrstuvwxyz" * 3000) == 26

Test Summary

Test	Why
`"abab"`	Validates the first example
`"abcd"`	Every character is distinct
`"aaaa"`	Every character is identical
`"a"`	Minimum allowed length
`"ab"`	Small distinct-character case
`"aa"`	Small duplicate-character case
`"abcabc"`	Repeated pattern
`"zzzzzz"`	Single distinct character repeated many times
`"abcdefghijklmnopqrstuvwxyz"`	Maximum possible distinct count
`"aabbccdd"`	Groups of repeated letters
`"abacadae"`	Distinct count appears gradually
`"qwertyqwerty"`	Repeated block of unique letters
`"a" * 100000`	Maximum length with one distinct letter
`"abcdefghijklmnopqrstuvwxyz" * 3000`	Large input stressing performance

Edge Cases

All Characters Are Identical

Consider a string such as "aaaaaa". A common mistake is to think that multiple substrings can be created because the string is long. However, every substring would start with 'a', violating the distinct-start requirement. The implementation correctly returns 1 because the set contains only one character.

Every Character Is Unique

For a string such as "abcdef", every character can serve as the start of its own substring. The implementation returns 6 because all six characters are distinct. This is the maximum possible answer for that string.

Large Input With Many Repetitions

A string such as "abcdefghijklmnopqrstuvwxyz" * 3000 contains 78,000 characters but only 26 distinct letters. A naive partitioning algorithm could waste enormous amounts of time exploring cuts. The set-based solution remains efficient because it only tracks which letters have appeared and never performs any partition simulation.

Minimum-Length Input

When the string length is exactly one, such as "x", there is only one possible substring, namely the entire string itself. The set contains one character, so the algorithm correctly returns 1.