LeetCode 3597 - Partition String

We are given a string s and must partition it into segments according to a very specific process. We start at the current position in the string and begin building a segment character by character.

LeetCode Problem 3597

Difficulty: 🟡 Medium
Topics: Hash Table, String, Trie, Simulation

Solution

LeetCode 3597 - Partition String

Problem Understanding

We are given a string s and must partition it into segments according to a very specific process.

We start at the current position in the string and begin building a segment character by character. As we extend the segment, we continuously check whether that exact segment has already appeared earlier as a completed segment.

If the current segment has been seen before, we must continue extending it. The moment the segment becomes unique, meaning it does not exist in our set of previously created segments, we immediately finalize it, add it to the answer, mark it as seen, and begin building the next segment starting at the following character.

The important detail is that we do not try to make segments as long as possible. Instead, we stop as soon as the current segment becomes unique.

For example, if "b" has already been seen, then when we encounter another 'b' we cannot finalize "b". We must keep extending until we obtain something like "bc" that has not been seen before.

The input consists of:

  • A string s
  • 1 <= s.length <= 10^5
  • Only lowercase English letters appear

The output is an array containing every segment produced by this procedure, in the order they are created.

The constraint of 10^5 characters is large enough that we need a near linear solution. Any approach that repeatedly scans previously generated segments or compares many substrings inefficiently may become too slow.

Several edge cases are worth noticing:

  • A string with a single character should produce one segment.
  • A string with many repeated characters, such as "aaaaa", repeatedly forces segment extensions.
  • The final partially built segment may never become unique before the string ends. In that case, it is not added to the answer because the algorithm only adds segments when they become unique.
  • Previously seen segments can have different lengths, so uniqueness must be checked on the entire string segment, not merely its length or starting character.

Problem Understanding

The problem asks us to partition a string s into segments such that each segment is unique in the order of processing. Starting from the first character, we attempt to build a segment by adding consecutive characters until the segment has not been seen before. Once a segment is determined to be unique, it is added to the list of segments, marked as seen, and a new segment begins from the next character. The goal is to return the final array of segments in the order they are created.

The input s is a string of lowercase English letters, with a length constraint of 1 <= s.length <= 10^5. The output is a list of strings, representing segments that appear in the order they were constructed.

Important constraints include:

  • Strings are non-empty and contain only lowercase letters.
  • The maximum length 10^5 means any solution must be linear or near-linear in time; quadratic approaches are too slow.
  • Each character can be repeated multiple times, which affects how we build unique segments.

Edge cases that could trip a naive implementation include:

  • Strings where all characters are identical, e.g., "aaaa".
  • Strings where all characters are distinct, e.g., "abcdef".
  • Strings with repeating patterns, e.g., "ababab".

Approaches

Brute Force

A straightforward approach is to simulate the process exactly.

Maintain a collection of previously seen segments. Starting from the current position, repeatedly extend the current substring one character at a time until it becomes unique. Once found, add it to the answer and continue.

The problem with a naive implementation is substring construction and comparison. If every uniqueness check requires scanning all previously seen segments, the complexity can become quadratic or worse.

In the worst case, there may be many segments, and each new candidate substring could be compared against many earlier segments. With a string length of 10^5, this becomes impractical.

Key Insight

The process itself is already greedy. For every new segment, we only need to know whether the currently built substring has appeared before.

A hash set provides exactly this operation in average O(1) time.

We maintain:

  • A set of previously seen segments.
  • A current segment being built character by character.

For each character:

  • Append it to the current segment.
  • Check whether the resulting string already exists in the set.
  • If it does not exist, finalize it immediately.
  • Add it to the answer and the set.
  • Reset the current segment builder.

This directly mirrors the problem statement.

Since every character is processed once and every finalized segment is inserted into a hash set once, the algorithm is efficient enough for the given constraints.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Repeatedly compares candidate segments against previously created segments
Optimal O(n) O(n) Uses a hash set for constant time uniqueness checks

Algorithm Walkthrough

  1. Create an empty hash set called seen.

This set stores every segment that has already been finalized. 2. Create an empty list called result.

This will store the answer in order. 3. Create an empty string builder for the current segment.

As we scan the input string, characters are appended to this segment. 4. Iterate through every character in s.

At each step, append the character to the current segment. 5. Check whether the current segment exists in seen.

If it exists, we cannot finalize it yet because the rules require the segment to be unique. 6. If the current segment does not exist in seen, finalize it immediately.

Add it to result, insert it into seen, and reset the current segment builder to empty. 7. Continue processing the remaining characters.

The next character begins construction of the next segment. 8. After processing all characters, return result.

Any leftover segment that is still in progress was never unique before the string ended, so it is not added.

Why it works

The problem explicitly requires us to stop extending a segment at the first moment it becomes unique. The algorithm follows exactly this rule.

At any point, seen contains precisely the segments that have already been finalized. Therefore, when the current segment first becomes absent from seen, it is the shortest possible unique segment beginning at the current position. Finalizing it immediately is both required and sufficient.

Because every segment is finalized exactly when the specification says it should be, the produced partition is correct. A brute-force approach would attempt to build every possible substring starting at the current index and check if it exists in a global "seen" set. If a substring is found in the set, we continue expanding it; otherwise, we mark it as seen and start a new segment. While this guarantees correctness, substring construction and repeated membership checks are costly, potentially resulting in O(n^2) time in the worst case. This is not feasible for n up to 10^5.

Optimal Approach

The key insight is that we do not need to check all substrings repeatedly. Instead, we only need to keep track of characters within the current segment. We can maintain a set of characters in the current segment and check whether adding a new character would make the segment non-unique. If it does, we commit the segment, reset the set, and start a new segment with the current character. This guarantees linear time complexity, O(n), because each character is processed at most twice: once when added and once when a segment is committed.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n^2) Checks all substrings for uniqueness using a global set
Optimal O(n) O(n) Uses a set for characters in the current segment, commits when duplicates appear

Algorithm Walkthrough

  1. Initialize an empty list segments to store the result.
  2. Initialize an empty set current_segment_chars to track characters in the current segment.
  3. Initialize an empty string current_segment to build the segment incrementally.
  4. Iterate over each character c in the string s.
  5. If c exists in current_segment_chars, this means the segment is no longer unique. Append current_segment to segments, reset current_segment to an empty string, and reset current_segment_chars to an empty set.
  6. Add c to current_segment and also to current_segment_chars.
  7. After finishing the loop, append the last current_segment to segments.
  8. Return segments.

Why it works: At each step, the invariant is that current_segment_chars contains exactly the characters in the segment being built. This ensures that as soon as a duplicate appears, the current segment is complete and unique. The algorithm processes each character in O(1) time relative to membership checks in the set, guaranteeing O(n) complexity.

Python Solution

from typing import List

class Solution:
    def partitionString(self, s: str) -> List[str]:
        seen = set()
        result = []
        current = []

        for ch in s:
            current.append(ch)
            segment = "".join(current)

            if segment not in seen:
                seen.add(segment)
                result.append(segment)
                current = []

        return result

The implementation closely follows the algorithm.

The seen set stores every previously finalized segment. The result list stores the output. The current list acts as a small string builder.

For each character, we append it to current and create the corresponding segment string. If that segment has not appeared before, it becomes the next answer segment. We insert it into seen, append it to result, and reset current so construction of the next segment can begin.

Because the logic directly mirrors the problem statement, correctness is easy to verify. segments = [] current_segment_chars = set() current_segment = ""

    for c in s:
        if c in current_segment_chars:
            segments.append(current_segment)
            current_segment_chars.clear()
            current_segment = ""
        
        current_segment += c
        current_segment_chars.add(c)
    
    if current_segment:
        segments.append(current_segment)
    
    return segments

In this Python implementation, `current_segment_chars` ensures that the segment remains unique. When a duplicate character is encountered, we commit the segment to the `segments` list, clear the set, and start building a new segment. After the loop, any remaining segment is appended to handle the last part of the string.

## Go Solution

```go
func partitionString(s string) []string {
	seen := make(map[string]struct{})
	result := make([]string, 0)

	current := ""

	for _, ch := range s {
		current += string(ch)

		if _, exists := seen[current]; !exists {
			seen[current] = struct{}{}
			result = append(result, current)
			current = ""
		}
	}

	return result
}

The Go implementation uses a map[string]struct{} as a hash set. The empty struct occupies zero storage and is the conventional way to represent set membership in Go.

The overall logic is identical to the Python solution. Whenever a newly built segment is not found in the set, it is finalized, stored, and the current builder is reset.

Worked Examples

Example 1

Input:

s = "abbccccd"
Character Current Segment Seen Before? Action Seen Set
a "a" No Add "a" {"a"}
b "b" No Add "b" {"a","b"}
b "b" Yes Continue {"a","b"}
c "bc" No Add "bc" {"a","b","bc"}
c "c" No Add "c" {"a","b","bc","c"}
c "c" Yes Continue {"a","b","bc","c"}
c "cc" No Add "cc" {"a","b","bc","c","cc"}
d "d" No Add "d" {"a","b","bc","c","cc","d"}

Final answer:

["a", "b", "bc", "c", "cc", "d"]

Example 2

Input:

s = "aaaa"
Character Current Segment Seen Before? Action Seen Set
a "a" No Add "a" {"a"}
a "a" Yes Continue {"a"}
a "aa" No Add "aa" {"a","aa"}
a "a" Yes Continue {"a","aa"}

Final answer:

["a", "aa"]
segments := []string{}
currentSegmentChars := make(map[rune]bool)
currentSegment := ""

for _, c := range s {
    if currentSegmentChars[c] {
        segments = append(segments, currentSegment)
        currentSegmentChars = make(map[rune]bool)
        currentSegment = ""
    }
    currentSegment += string(c)
    currentSegmentChars[c] = true
}

if currentSegment != "" {
    segments = append(segments, currentSegment)
}

return segments

}


The Go implementation mirrors Python logic, with a map used to track characters in the current segment. Unlike Python sets, Go requires a `map[rune]bool`. String concatenation is straightforward and we append the last segment after the loop.

## Worked Examples

### Example 1: `"abbccccd"`

| Index | Char | Current Segment | Seen Characters | Segments |
| --- | --- | --- | --- | --- |
| 0 | a | "a" | {a} | [] |
| 1 | b | "b" | {b} | ["a"] |
| 2 | b | "b" | {b} | ["a"] (duplicate triggers new segment) |
| 3 | c | "bc" | {b, c} | ["a", "b"] |
| 4 | c | "c" | {c} | ["a", "b", "bc"] |
| 5 | c | "c" | {c} | ["a", "b", "bc"] (duplicate triggers new segment) |
| 6 | c | "cc" | {c} | ["a", "b", "bc", "c"] |
| 7 | d | "d" | {d} | ["a", "b", "bc", "c", "cc"] |

Final segments: `["a","b","bc","c","cc","d"]`

### Example 2: `"aaaa"`

| Index | Char | Current Segment | Seen Characters | Segments |
| --- | --- | --- | --- | --- |
| 0 | a | "a" | {a} | [] |
| 1 | a | "a" | {a} | ["a"] (duplicate triggers new segment) |
| 2 | a | "aa" | {a} | ["a"] |
| 3 | a | "a" | {a} | ["a", "aa"] |

Final segments: `["a","aa"]`

## Complexity Analysis

| Measure | Complexity | Explanation |
| --- | --- | --- |
| Time | O(n) | Each character is processed once, and hash set lookups are O(1) on average |
| Space | O(n) | The set of previously created segments can contain O(n) total characters |

The algorithm performs a single left to right scan of the string. Each finalized segment is inserted into the hash set exactly once. Average hash table operations are constant time, resulting in overall linear complexity. The stored segments collectively occupy at most linear space.
| Time | O(n) | Each character is processed at most twice: once added to the segment, once triggers new segment |
| Space | O(n) | The segments array and set/map for tracking characters both grow proportionally to the string length |

The space complexity is dominated by the result list and the set/map for the current segment, both bounded by `n`.

## Test Cases

sol = Solution()

assert sol.partitionString("abbccccd") == ["a", "b", "bc", "c", "cc", "d"] # provided example assert sol.partitionString("aaaa") == ["a", "aa"] # provided example

assert sol.partitionString("a") == ["a"] # single character assert sol.partitionString("abcd") == ["a", "b", "c", "d"] # all unique

assert sol.partitionString("aa") == ["a"] # leftover duplicate segment at end assert sol.partitionString("aaa") == ["a", "aa"] # repeated character growth

assert sol.partitionString("abab") == ["a", "b", "ab"] # reuse of previous segments assert sol.partitionString("abcabc") == ["a", "b", "c", "ab", "ca"] # mixed repeats

assert sol.partitionString("zzzzzz") == ["z", "zz", "zzz"] # increasing segment lengths

assert sol.partitionString("abcdefghijklmnopqrstuvwxyz") == list( "abcdefghijklmnopqrstuvwxyz" ) # every character unique

Provided examples

assert Solution().partitionString("abbccccd") == ["a","b","bc","c","cc","d"] # mixed repeats assert Solution().partitionString("aaaa") == ["a","aa"] # repeated same char

Edge cases

assert Solution().partitionString("a") == ["a"] # single character assert Solution().partitionString("abcdef") == ["a","b","c","d","e","f"] # all distinct assert Solution().partitionString("ababab") == ["a","b","ab","ab"] # repeating pattern

Maximum size

s = "a" * 105 expected = ["a"] * (105 // 1) assert Solution().partitionString(s) == expected


| Test | Why |
| --- | --- |
| `"abbccccd"` | Official example with multiple segment lengths |
| `"aaaa"` | Official example with repeated growth |
| `"a"` | Smallest valid input |
| `"abcd"` | Every segment immediately unique |
| `"aa"` | Leaves a duplicate segment unfinished |
| `"aaa"` | Builds a longer segment after repetition |
| `"abab"` | Repeated short segments force extension |
| `"abcabc"` | Mix of unique and repeated patterns |
| `"zzzzzz"` | Produces progressively longer segments |
| `"abcdefghijklmnopqrstuvwxyz"` | Large set of immediately unique segments |

## Edge Cases

### Single Character Input

When `s` contains only one character, the first segment is automatically unique because the `seen` set starts empty. The algorithm immediately finalizes it and returns a one element array.

### Repeated Characters Creating Longer Segments

Strings such as `"aaaaaa"` repeatedly encounter segments that already exist. A common bug is prematurely finalizing duplicate segments. The implementation correctly continues extending until a new unseen segment is formed, producing segments like `"a"`, `"aa"`, `"aaa"`, and so on.

### Leftover Unfinished Segment

Consider `"aa"`. The first `"a"` becomes a valid segment. The second `"a"` is already present in `seen`, and the string ends before it can be extended into a new unique segment. The rules only finalize segments when they become unique, so the trailing duplicate segment is discarded. The implementation naturally handles this because only unique segments are appended to the answer.

### All Characters Distinct

For inputs such as `"abcdef"`, every one character segment is immediately unique. The algorithm finalizes each character individually, producing the shortest possible segments and demonstrating the best case behavior.

### Alternating Repetition Patterns

Inputs like `"abababab"` can expose errors in uniqueness tracking. The algorithm stores complete segment strings in the hash set, so `"a"`, `"b"`, `"ab"`, `"aba"`, and similar segments are treated as distinct values. This guarantees correct behavior regardless of how segment lengths evolve during the scan.
| "abbccccd" | Typical input with mixed repeats |
| "aaaa" | Consecutive repeats of same character |
| "a" | Minimum input size |
| "abcdef" | All characters distinct |
| "ababab" | Alternating repeating pattern |
| "a"*10^5 | Stress test for maximum input size |

## Edge Cases

One edge case is a string with a single character. Here, the segment is trivially the string itself, and the algorithm correctly handles it because the loop executes once and appends the last segment.

Another edge case is a string with all distinct characters, e.g., `"abcdef"`. Each character forms its own segment, testing that the algorithm does not incorrectly combine unique characters.

A third edge case is strings with repeating patterns, such as `"ababab"`. This tests that the algorithm correctly resets the segment each time a duplicate character is encountered, producing segments like `"a","b","ab","ab"` rather than merging characters incorrectly.

These edge cases ensure robustness for minimum size, maximum distinctness, and repeated patterns.