LeetCode 3597 - Partition String
We are given a string s and must partition it into segments according to a very specific process. We start at the current position in the string and begin building a segment character by character.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Trie, Simulation
Solution
LeetCode 3597 - Partition String
Problem Understanding
We are given a string s and must partition it into segments according to a very specific process.
We start at the current position in the string and begin building a segment character by character. As we extend the segment, we continuously check whether that exact segment has already appeared earlier as a completed segment.
If the current segment has been seen before, we must continue extending it. The moment the segment becomes unique, meaning it does not exist in our set of previously created segments, we immediately finalize it, add it to the answer, mark it as seen, and begin building the next segment starting at the following character.
The important detail is that we do not try to make segments as long as possible. Instead, we stop as soon as the current segment becomes unique.
For example, if "b" has already been seen, then when we encounter another 'b' we cannot finalize "b". We must keep extending until we obtain something like "bc" that has not been seen before.
The input consists of:
- A string
s 1 <= s.length <= 10^5- Only lowercase English letters appear
The output is an array containing every segment produced by this procedure, in the order they are created.
The constraint of 10^5 characters is large enough that we need a near linear solution. Any approach that repeatedly scans previously generated segments or compares many substrings inefficiently may become too slow.
Several edge cases are worth noticing:
- A string with a single character should produce one segment.
- A string with many repeated characters, such as
"aaaaa", repeatedly forces segment extensions. - The final partially built segment may never become unique before the string ends. In that case, it is not added to the answer because the algorithm only adds segments when they become unique.
- Previously seen segments can have different lengths, so uniqueness must be checked on the entire string segment, not merely its length or starting character.
Problem Understanding
The problem asks us to partition a string s into segments such that each segment is unique in the order of processing. Starting from the first character, we attempt to build a segment by adding consecutive characters until the segment has not been seen before. Once a segment is determined to be unique, it is added to the list of segments, marked as seen, and a new segment begins from the next character. The goal is to return the final array of segments in the order they are created.
The input s is a string of lowercase English letters, with a length constraint of 1 <= s.length <= 10^5. The output is a list of strings, representing segments that appear in the order they were constructed.
Important constraints include:
- Strings are non-empty and contain only lowercase letters.
- The maximum length
10^5means any solution must be linear or near-linear in time; quadratic approaches are too slow. - Each character can be repeated multiple times, which affects how we build unique segments.
Edge cases that could trip a naive implementation include:
- Strings where all characters are identical, e.g.,
"aaaa". - Strings where all characters are distinct, e.g.,
"abcdef". - Strings with repeating patterns, e.g.,
"ababab".
Approaches
Brute Force
A straightforward approach is to simulate the process exactly.
Maintain a collection of previously seen segments. Starting from the current position, repeatedly extend the current substring one character at a time until it becomes unique. Once found, add it to the answer and continue.
The problem with a naive implementation is substring construction and comparison. If every uniqueness check requires scanning all previously seen segments, the complexity can become quadratic or worse.
In the worst case, there may be many segments, and each new candidate substring could be compared against many earlier segments. With a string length of 10^5, this becomes impractical.
Key Insight
The process itself is already greedy. For every new segment, we only need to know whether the currently built substring has appeared before.
A hash set provides exactly this operation in average O(1) time.
We maintain:
- A set of previously seen segments.
- A current segment being built character by character.
For each character:
- Append it to the current segment.
- Check whether the resulting string already exists in the set.
- If it does not exist, finalize it immediately.
- Add it to the answer and the set.
- Reset the current segment builder.
This directly mirrors the problem statement.
Since every character is processed once and every finalized segment is inserted into a hash set once, the algorithm is efficient enough for the given constraints.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Repeatedly compares candidate segments against previously created segments |
| Optimal | O(n) | O(n) | Uses a hash set for constant time uniqueness checks |
Algorithm Walkthrough
- Create an empty hash set called
seen.
This set stores every segment that has already been finalized.
2. Create an empty list called result.
This will store the answer in order. 3. Create an empty string builder for the current segment.
As we scan the input string, characters are appended to this segment.
4. Iterate through every character in s.
At each step, append the character to the current segment.
5. Check whether the current segment exists in seen.
If it exists, we cannot finalize it yet because the rules require the segment to be unique.
6. If the current segment does not exist in seen, finalize it immediately.
Add it to result, insert it into seen, and reset the current segment builder to empty.
7. Continue processing the remaining characters.
The next character begins construction of the next segment.
8. After processing all characters, return result.
Any leftover segment that is still in progress was never unique before the string ended, so it is not added.
Why it works
The problem explicitly requires us to stop extending a segment at the first moment it becomes unique. The algorithm follows exactly this rule.
At any point, seen contains precisely the segments that have already been finalized. Therefore, when the current segment first becomes absent from seen, it is the shortest possible unique segment beginning at the current position. Finalizing it immediately is both required and sufficient.
Because every segment is finalized exactly when the specification says it should be, the produced partition is correct.
A brute-force approach would attempt to build every possible substring starting at the current index and check if it exists in a global "seen" set. If a substring is found in the set, we continue expanding it; otherwise, we mark it as seen and start a new segment. While this guarantees correctness, substring construction and repeated membership checks are costly, potentially resulting in O(n^2) time in the worst case. This is not feasible for n up to 10^5.
Optimal Approach
The key insight is that we do not need to check all substrings repeatedly. Instead, we only need to keep track of characters within the current segment. We can maintain a set of characters in the current segment and check whether adding a new character would make the segment non-unique. If it does, we commit the segment, reset the set, and start a new segment with the current character. This guarantees linear time complexity, O(n), because each character is processed at most twice: once when added and once when a segment is committed.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) | O(n^2) | Checks all substrings for uniqueness using a global set |
| Optimal | O(n) | O(n) | Uses a set for characters in the current segment, commits when duplicates appear |
Algorithm Walkthrough
- Initialize an empty list
segmentsto store the result. - Initialize an empty set
current_segment_charsto track characters in the current segment. - Initialize an empty string
current_segmentto build the segment incrementally. - Iterate over each character
cin the strings. - If
cexists incurrent_segment_chars, this means the segment is no longer unique. Appendcurrent_segmenttosegments, resetcurrent_segmentto an empty string, and resetcurrent_segment_charsto an empty set. - Add
ctocurrent_segmentand also tocurrent_segment_chars. - After finishing the loop, append the last
current_segmenttosegments. - Return
segments.
Why it works: At each step, the invariant is that current_segment_chars contains exactly the characters in the segment being built. This ensures that as soon as a duplicate appears, the current segment is complete and unique. The algorithm processes each character in O(1) time relative to membership checks in the set, guaranteeing O(n) complexity.
Python Solution
from typing import List
class Solution:
def partitionString(self, s: str) -> List[str]:
seen = set()
result = []
current = []
for ch in s:
current.append(ch)
segment = "".join(current)
if segment not in seen:
seen.add(segment)
result.append(segment)
current = []
return result
The implementation closely follows the algorithm.
The seen set stores every previously finalized segment. The result list stores the output. The current list acts as a small string builder.
For each character, we append it to current and create the corresponding segment string. If that segment has not appeared before, it becomes the next answer segment. We insert it into seen, append it to result, and reset current so construction of the next segment can begin.
Because the logic directly mirrors the problem statement, correctness is easy to verify. segments = [] current_segment_chars = set() current_segment = ""
for c in s:
if c in current_segment_chars:
segments.append(current_segment)
current_segment_chars.clear()
current_segment = ""
current_segment += c
current_segment_chars.add(c)
if current_segment:
segments.append(current_segment)
return segments
In this Python implementation, `current_segment_chars` ensures that the segment remains unique. When a duplicate character is encountered, we commit the segment to the `segments` list, clear the set, and start building a new segment. After the loop, any remaining segment is appended to handle the last part of the string.
## Go Solution
```go
func partitionString(s string) []string {
seen := make(map[string]struct{})
result := make([]string, 0)
current := ""
for _, ch := range s {
current += string(ch)
if _, exists := seen[current]; !exists {
seen[current] = struct{}{}
result = append(result, current)
current = ""
}
}
return result
}
The Go implementation uses a map[string]struct{} as a hash set. The empty struct occupies zero storage and is the conventional way to represent set membership in Go.
The overall logic is identical to the Python solution. Whenever a newly built segment is not found in the set, it is finalized, stored, and the current builder is reset.
Worked Examples
Example 1
Input:
s = "abbccccd"
| Character | Current Segment | Seen Before? | Action | Seen Set |
|---|---|---|---|---|
| a | "a" | No | Add "a" | {"a"} |
| b | "b" | No | Add "b" | {"a","b"} |
| b | "b" | Yes | Continue | {"a","b"} |
| c | "bc" | No | Add "bc" | {"a","b","bc"} |
| c | "c" | No | Add "c" | {"a","b","bc","c"} |
| c | "c" | Yes | Continue | {"a","b","bc","c"} |
| c | "cc" | No | Add "cc" | {"a","b","bc","c","cc"} |
| d | "d" | No | Add "d" | {"a","b","bc","c","cc","d"} |
Final answer:
["a", "b", "bc", "c", "cc", "d"]
Example 2
Input:
s = "aaaa"
| Character | Current Segment | Seen Before? | Action | Seen Set |
|---|---|---|---|---|
| a | "a" | No | Add "a" | {"a"} |
| a | "a" | Yes | Continue | {"a"} |
| a | "aa" | No | Add "aa" | {"a","aa"} |
| a | "a" | Yes | Continue | {"a","aa"} |
Final answer:
["a", "aa"]
segments := []string{}
currentSegmentChars := make(map[rune]bool)
currentSegment := ""
for _, c := range s {
if currentSegmentChars[c] {
segments = append(segments, currentSegment)
currentSegmentChars = make(map[rune]bool)
currentSegment = ""
}
currentSegment += string(c)
currentSegmentChars[c] = true
}
if currentSegment != "" {
segments = append(segments, currentSegment)
}
return segments
}
The Go implementation mirrors Python logic, with a map used to track characters in the current segment. Unlike Python sets, Go requires a `map[rune]bool`. String concatenation is straightforward and we append the last segment after the loop.
## Worked Examples
### Example 1: `"abbccccd"`
| Index | Char | Current Segment | Seen Characters | Segments |
| --- | --- | --- | --- | --- |
| 0 | a | "a" | {a} | [] |
| 1 | b | "b" | {b} | ["a"] |
| 2 | b | "b" | {b} | ["a"] (duplicate triggers new segment) |
| 3 | c | "bc" | {b, c} | ["a", "b"] |
| 4 | c | "c" | {c} | ["a", "b", "bc"] |
| 5 | c | "c" | {c} | ["a", "b", "bc"] (duplicate triggers new segment) |
| 6 | c | "cc" | {c} | ["a", "b", "bc", "c"] |
| 7 | d | "d" | {d} | ["a", "b", "bc", "c", "cc"] |
Final segments: `["a","b","bc","c","cc","d"]`
### Example 2: `"aaaa"`
| Index | Char | Current Segment | Seen Characters | Segments |
| --- | --- | --- | --- | --- |
| 0 | a | "a" | {a} | [] |
| 1 | a | "a" | {a} | ["a"] (duplicate triggers new segment) |
| 2 | a | "aa" | {a} | ["a"] |
| 3 | a | "a" | {a} | ["a", "aa"] |
Final segments: `["a","aa"]`
## Complexity Analysis
| Measure | Complexity | Explanation |
| --- | --- | --- |
| Time | O(n) | Each character is processed once, and hash set lookups are O(1) on average |
| Space | O(n) | The set of previously created segments can contain O(n) total characters |
The algorithm performs a single left to right scan of the string. Each finalized segment is inserted into the hash set exactly once. Average hash table operations are constant time, resulting in overall linear complexity. The stored segments collectively occupy at most linear space.
| Time | O(n) | Each character is processed at most twice: once added to the segment, once triggers new segment |
| Space | O(n) | The segments array and set/map for tracking characters both grow proportionally to the string length |
The space complexity is dominated by the result list and the set/map for the current segment, both bounded by `n`.
## Test Cases
sol = Solution()
assert sol.partitionString("abbccccd") == ["a", "b", "bc", "c", "cc", "d"] # provided example assert sol.partitionString("aaaa") == ["a", "aa"] # provided example
assert sol.partitionString("a") == ["a"] # single character assert sol.partitionString("abcd") == ["a", "b", "c", "d"] # all unique
assert sol.partitionString("aa") == ["a"] # leftover duplicate segment at end assert sol.partitionString("aaa") == ["a", "aa"] # repeated character growth
assert sol.partitionString("abab") == ["a", "b", "ab"] # reuse of previous segments assert sol.partitionString("abcabc") == ["a", "b", "c", "ab", "ca"] # mixed repeats
assert sol.partitionString("zzzzzz") == ["z", "zz", "zzz"] # increasing segment lengths
assert sol.partitionString("abcdefghijklmnopqrstuvwxyz") == list( "abcdefghijklmnopqrstuvwxyz" ) # every character unique
Provided examples
assert Solution().partitionString("abbccccd") == ["a","b","bc","c","cc","d"] # mixed repeats assert Solution().partitionString("aaaa") == ["a","aa"] # repeated same char
Edge cases
assert Solution().partitionString("a") == ["a"] # single character assert Solution().partitionString("abcdef") == ["a","b","c","d","e","f"] # all distinct assert Solution().partitionString("ababab") == ["a","b","ab","ab"] # repeating pattern
Maximum size
s = "a" * 105 expected = ["a"] * (105 // 1) assert Solution().partitionString(s) == expected
| Test | Why |
| --- | --- |
| `"abbccccd"` | Official example with multiple segment lengths |
| `"aaaa"` | Official example with repeated growth |
| `"a"` | Smallest valid input |
| `"abcd"` | Every segment immediately unique |
| `"aa"` | Leaves a duplicate segment unfinished |
| `"aaa"` | Builds a longer segment after repetition |
| `"abab"` | Repeated short segments force extension |
| `"abcabc"` | Mix of unique and repeated patterns |
| `"zzzzzz"` | Produces progressively longer segments |
| `"abcdefghijklmnopqrstuvwxyz"` | Large set of immediately unique segments |
## Edge Cases
### Single Character Input
When `s` contains only one character, the first segment is automatically unique because the `seen` set starts empty. The algorithm immediately finalizes it and returns a one element array.
### Repeated Characters Creating Longer Segments
Strings such as `"aaaaaa"` repeatedly encounter segments that already exist. A common bug is prematurely finalizing duplicate segments. The implementation correctly continues extending until a new unseen segment is formed, producing segments like `"a"`, `"aa"`, `"aaa"`, and so on.
### Leftover Unfinished Segment
Consider `"aa"`. The first `"a"` becomes a valid segment. The second `"a"` is already present in `seen`, and the string ends before it can be extended into a new unique segment. The rules only finalize segments when they become unique, so the trailing duplicate segment is discarded. The implementation naturally handles this because only unique segments are appended to the answer.
### All Characters Distinct
For inputs such as `"abcdef"`, every one character segment is immediately unique. The algorithm finalizes each character individually, producing the shortest possible segments and demonstrating the best case behavior.
### Alternating Repetition Patterns
Inputs like `"abababab"` can expose errors in uniqueness tracking. The algorithm stores complete segment strings in the hash set, so `"a"`, `"b"`, `"ab"`, `"aba"`, and similar segments are treated as distinct values. This guarantees correct behavior regardless of how segment lengths evolve during the scan.
| "abbccccd" | Typical input with mixed repeats |
| "aaaa" | Consecutive repeats of same character |
| "a" | Minimum input size |
| "abcdef" | All characters distinct |
| "ababab" | Alternating repeating pattern |
| "a"*10^5 | Stress test for maximum input size |
## Edge Cases
One edge case is a string with a single character. Here, the segment is trivially the string itself, and the algorithm correctly handles it because the loop executes once and appends the last segment.
Another edge case is a string with all distinct characters, e.g., `"abcdef"`. Each character forms its own segment, testing that the algorithm does not incorrectly combine unique characters.
A third edge case is strings with repeating patterns, such as `"ababab"`. This tests that the algorithm correctly resets the segment each time a duplicate character is encountered, producing segments like `"a","b","ab","ab"` rather than merging characters incorrectly.
These edge cases ensure robustness for minimum size, maximum distinctness, and repeated patterns.