Difficulty: 🔴 Hard
Topics: Hash Table, String, Greedy, Sorting

Solution

LeetCode 1520 - Maximum Number of Non-Overlapping Substrings

Problem Understanding

This problem asks us to select as many non-overlapping substrings as possible from a given lowercase string s, while satisfying a strict containment rule.

A substring is considered valid only if, whenever it contains a character c, it also contains every occurrence of c that exists anywhere in the original string. In other words, if a substring includes even one occurrence of a letter, then that substring must completely cover the full range of that letter in the original string.

The goal is not simply to find valid substrings. We must maximize the number of non-overlapping valid substrings. If multiple solutions contain the same number of substrings, we must choose the one whose total combined length is smallest.

The input is a single string s containing only lowercase English letters. The string length can be as large as 10^5, which immediately tells us that exponential or quadratic brute force enumeration of all substrings will not work efficiently.

The key difficulty comes from the interaction between characters. If we include one character inside a substring, we may be forced to expand the substring to include all occurrences of that character. During expansion, we may encounter new characters whose occurrences extend even further, causing additional expansion. This chaining effect makes the problem more subtle than ordinary interval selection.

Several edge cases are important:

A string where all characters are unique, every single character becomes its own valid substring.
A string where all characters are identical, only one substring is possible.
Nested character ranges can create situations where a smaller substring is preferable because it allows more total substrings.
Overlapping valid intervals require careful greedy selection to maximize count while minimizing total length.

Approaches

Brute Force Approach

A brute force solution would generate every possible substring and test whether it is valid.

To validate a substring, we would check every character inside it and verify that all occurrences of that character in the original string also lie inside the substring. After collecting all valid substrings, we could attempt every combination of non-overlapping substrings to find the optimal answer.

This approach is correct because it exhaustively explores all possibilities. However, it is computationally infeasible.

A string of length n contains O(n^2) substrings. Validating each substring costs up to O(n), and searching all non-overlapping combinations becomes exponential. With n = 10^5, this approach is impossible.

Optimal Greedy Interval Approach

The important insight is that every valid substring corresponds to a minimal valid interval for some starting character.

For each character, we first compute:

The first occurrence of the character
The last occurrence of the character

Then, starting from the first occurrence of a character, we attempt to construct the smallest valid interval that contains all required characters.

While scanning the interval:

If we encounter a character whose first occurrence lies before the current interval start, the interval is invalid.
Otherwise, we expand the interval end to include the last occurrence of every character we encounter.

This process creates candidate intervals that are minimal valid substrings.

Once we have all valid intervals, the problem becomes a classic interval scheduling problem. To maximize the number of non-overlapping intervals while minimizing total length, we greedily select intervals by increasing ending position.

Because there are only 26 lowercase English letters, the number of candidate intervals is extremely small, making the algorithm efficient even for large strings.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n³) or worse	O(n²)	Enumerates and validates all substrings
Optimal	O(n)	O(1)	Builds minimal valid intervals and greedily selects

Algorithm Walkthrough

Compute the first and last occurrence of every character in the string.

We create two arrays of size 26:

first[c] stores the first index where character c appears.
last[c] stores the last index where character c appears.

These ranges define the minimum span required for each character. 2. Iterate through the string and only attempt interval construction at first occurrences.

If index i is not the first occurrence of s[i], then starting a substring there cannot produce a minimal valid interval. 3. Construct the minimal valid interval.

Initially:

start = i
end = last[s[i]]

Then scan from start to end. 4. Expand the interval when necessary.

While scanning characters inside the interval:

If a character has a first occurrence before start, the interval is invalid because we cannot include all occurrences of that character without extending leftward.
Otherwise, update end to the maximum of the current end and the last occurrence of the current character.

This guarantees the interval contains all occurrences of every included character. 5. Store every valid interval.

Each successful interval represents the smallest valid substring rooted at a character's first occurrence. 6. Greedily select non-overlapping intervals.

Sort or process intervals by ending position.

Maintain:

previous_end, the end index of the last selected interval.

Select an interval if its start is greater than previous_end.

Choosing the earliest finishing interval leaves maximum room for future intervals. 7. Extract substrings from the selected intervals.

Convert interval boundaries back into actual substrings using slicing.

Why it works

Every valid substring must fully contain the ranges of all characters it includes. The interval expansion process guarantees this property by repeatedly extending the interval to include required character ranges.

The constructed interval is minimal because we only expand when necessary. Among all valid intervals, greedily choosing the interval with the earliest ending position maximizes the number of non-overlapping selections. This is the standard proof used in interval scheduling problems.

When multiple solutions contain the same number of substrings, selecting minimal intervals naturally minimizes total length.

Python Solution

from typing import List

class Solution:
    def maxNumOfSubstrings(self, s: str) -> List[str]:
        n = len(s)

        first = [n] * 26
        last = [-1] * 26

        # Record first and last occurrence of each character
        for i, ch in enumerate(s):
            idx = ord(ch) - ord('a')
            first[idx] = min(first[idx], i)
            last[idx] = i

        def build_interval(start: int) -> tuple[int, int]:
            end = last[ord(s[start]) - ord('a')]
            i = start

            while i <= end:
                idx = ord(s[i]) - ord('a')

                # Invalid interval
                if first[idx] < start:
                    return -1, -1

                end = max(end, last[idx])
                i += 1

            return start, end

        intervals = []

        # Generate all minimal valid intervals
        for i in range(n):
            idx = ord(s[i]) - ord('a')

            if i == first[idx]:
                interval = build_interval(i)

                if interval[0] != -1:
                    intervals.append(interval)

        # Greedy interval scheduling by end position
        intervals.sort(key=lambda x: x[1])

        result = []
        previous_end = -1

        for start, end in intervals:
            if start > previous_end:
                result.append(s[start:end + 1])
                previous_end = end

        return result

The implementation begins by preprocessing the string to determine the first and last occurrence of every character. This allows constant-time access to character ranges during interval construction.

The build_interval helper function attempts to construct the minimal valid substring starting from a given index. The interval expands dynamically whenever a character inside the interval has occurrences extending further right.

If the scan encounters a character whose first occurrence lies before the interval start, the interval becomes invalid because it cannot contain all occurrences of that character without extending leftward.

After generating all valid intervals, the solution sorts them by ending index and applies the classic greedy interval scheduling strategy. Selecting the interval that finishes earliest maximizes the number of non-overlapping substrings.

Finally, the selected intervals are converted into actual substrings and returned.

Go Solution

package main

import "sort"

func maxNumOfSubstrings(s string) []string {
	n := len(s)

	first := make([]int, 26)
	last := make([]int, 26)

	for i := 0; i < 26; i++ {
		first[i] = n
		last[i] = -1
	}

	// Record first and last occurrence
	for i := 0; i < n; i++ {
		idx := int(s[i] - 'a')

		if i < first[idx] {
			first[idx] = i
		}

		last[idx] = i
	}

	buildInterval := func(start int) (int, int) {
		end := last[int(s[start]-'a')]

		for i := start; i <= end; i++ {
			idx := int(s[i] - 'a')

			// Invalid interval
			if first[idx] < start {
				return -1, -1
			}

			if last[idx] > end {
				end = last[idx]
			}
		}

		return start, end
	}

	intervals := make([][2]int, 0)

	// Generate valid intervals
	for i := 0; i < n; i++ {
		idx := int(s[i] - 'a')

		if i == first[idx] {
			start, end := buildInterval(i)

			if start != -1 {
				intervals = append(intervals, [2]int{start, end})
			}
		}
	}

	// Sort by ending position
	sort.Slice(intervals, func(i, j int) bool {
		return intervals[i][1] < intervals[j][1]
	})

	result := make([]string, 0)
	previousEnd := -1

	for _, interval := range intervals {
		start := interval[0]
		end := interval[1]

		if start > previousEnd {
			result = append(result, s[start:end+1])
			previousEnd = end
		}
	}

	return result
}

The Go implementation follows the same algorithmic structure as the Python solution. Arrays are used instead of dictionaries because there are only 26 lowercase letters.

The interval list is stored as a slice of fixed-size integer arrays. Go requires explicit sorting using sort.Slice.

String slicing in Go uses byte indices, which is safe here because the input contains only lowercase English letters. No special Unicode handling is needed.

Worked Examples

Example 1

Input:

s = "adefaddaccc"

Step 1: Compute first and last occurrences

Character	First	Last
a	0	7
d	1	6
e	2	2
f	3	3
c	8	10

Step 2: Build intervals

Start at index 0, character 'a'

Initial interval:

[0, 7]

Scanning:

Index	Character	Action
0	a	end remains 7
1	d	end remains 7
2	e	end remains 7
3	f	end remains 7
4	a	end remains 7
5	d	end remains 7
6	d	end remains 7
7	a	end remains 7

Valid interval:

"adefadda"

Start at index 2, character 'e'

Interval:

[2, 2]

Valid substring:

"e"

Start at index 3, character 'f'

Interval:

[3, 3]

Valid substring:

"f"

Start at index 8, character 'c'

Interval:

[8, 10]

Valid substring:

"ccc"

Step 3: Greedy selection

Sorted intervals by end:

Interval	Substring
[2,2]	e
[3,3]	f
[0,7]	adefadda
[8,10]	ccc

Selection:

Choose "e"
Choose "f"
Skip "adefadda" because it overlaps
Choose "ccc"

Final answer:

["e", "f", "ccc"]

Example 2

Input:

s = "abbaccd"

Valid intervals

Interval	Substring
[0,3]	abba
[1,2]	bb
[4,5]	cc
[6,6]	d

Greedy selection

Sorted by end:

Interval	Substring
[1,2]	bb
[0,3]	abba
[4,5]	cc
[6,6]	d

Selected:

["bb", "cc", "d"]

This gives 3 substrings with minimum total length.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each character is processed a constant number of times
Space	O(1)	Only fixed-size arrays for 26 letters are used

The preprocessing pass over the string takes linear time. Interval construction also remains linear overall because each scan is bounded by the size of the string and there are at most 26 candidate intervals.

The greedy selection phase is effectively constant time because there are at most 26 intervals to sort.

Space complexity is constant because all auxiliary storage depends only on the alphabet size, not on the input length.

Test Cases

solution = Solution()

# Provided examples
assert sorted(solution.maxNumOfSubstrings("adefaddaccc")) == sorted(["e", "f", "ccc"])  # example 1
assert sorted(solution.maxNumOfSubstrings("abbaccd")) == sorted(["d", "bb", "cc"])  # example 2

# Single character
assert solution.maxNumOfSubstrings("a") == ["a"]  # smallest possible input

# All unique characters
assert sorted(solution.maxNumOfSubstrings("abc")) == sorted(["a", "b", "c"])  # every char independent

# All identical characters
assert solution.maxNumOfSubstrings("aaaa") == ["aaaa"]  # only one valid substring

# Nested intervals
assert solution.maxNumOfSubstrings("ababa") == ["ababa"]  # overlapping dependencies

# Multiple isolated groups
assert sorted(solution.maxNumOfSubstrings("abcddcbaefg")) == sorted(["abcddcba", "e", "f", "g"])  # mixed ranges

# Separate repeated groups
assert sorted(solution.maxNumOfSubstrings("aabbcc")) == sorted(["aa", "bb", "cc"])  # disjoint repeated ranges

# Complex overlap
assert solution.maxNumOfSubstrings("cabcccbaa") == ["cabcccbaa"]  # entire string required

# Long independent suffix
assert sorted(solution.maxNumOfSubstrings("zzzyx")) == sorted(["zzz", "y", "x"])  # repeated plus unique chars

Test	Why
`"adefaddaccc"`	Verifies the main example and greedy selection
`"abbaccd"`	Confirms minimum total length tie-breaking
`"a"`	Tests smallest valid input
`"abc"`	Ensures isolated characters become single-letter substrings
`"aaaa"`	Tests fully overlapping occurrences
`"ababa"`	Validates nested dependencies
`"abcddcbaefg"`	Checks independent interval groups
`"aabbcc"`	Verifies disjoint repeated ranges
`"cabcccbaa"`	Tests case where entire string is required
`"zzzyx"`	Mixes repeated and unique characters

Edge Cases

One important edge case occurs when every character in the string is unique. In this situation, each character's first and last occurrence are the same index. Every single character becomes a valid substring of length one. A buggy implementation might unnecessarily merge intervals, but the interval expansion logic correctly keeps them separate.

Another critical case is when all characters overlap completely, such as "aaaa" or "ababa". Here, selecting smaller substrings would violate the rule requiring all occurrences of each character to be included. The implementation handles this by expanding intervals until all dependent character ranges are fully covered.

A more subtle edge case involves nested intervals, where one valid interval completely contains another. For example, in "abbaccd", the interval "abba" contains "bb". Choosing the smaller interval is better because it allows more total substrings and smaller total length. The greedy strategy based on earliest finishing interval correctly prioritizes the shorter interval.

A final important case occurs when interval expansion discovers a character whose first occurrence lies before the current interval start. This means the candidate interval can never become valid without extending leftward. The implementation detects this immediately and discards the interval, preventing invalid substrings from entering the result set.