LeetCode 3104 - Find Longest Self-Contained Substring
The problem asks us to find the longest self-contained substring in a given string s. A substring is self-contained if it does not share any characters with the rest of the string.
Difficulty: 🔴 Hard
Topics: Hash Table, String, Sorting
Solution
Problem Understanding
The problem asks us to find the longest self-contained substring in a given string s. A substring is self-contained if it does not share any characters with the rest of the string. More formally, for a substring t of s, every character in t must appear only inside t and not elsewhere in s. Additionally, the substring cannot be the entire string itself. If no such substring exists, we return -1.
The input is a string s with length up to 50,000, consisting of only lowercase English letters. The output is a single integer: the length of the longest self-contained substring.
Important edge cases include strings where every character repeats multiple times, strings with all unique characters, and strings where the self-contained substring is at the edges.
Examples to clarify:
"abba"→"bb"is valid because the'b'characters only exist in the substring. Output:2."abab"→ no substring is self-contained because every character occurs outside any candidate substring. Output:-1."abacd"→"abac"is self-contained because'd'is outside and doesn't occur inside. Output:4.
Approaches
Brute Force Approach
A brute force approach would generate all possible substrings of s except the whole string. For each substring, we would check if every character inside the substring exists only inside it and nowhere else in s. This requires iterating through all substrings, and for each substring, checking character occurrences in the rest of the string. This results in O(n^3) time complexity (O(n^2) substrings and O(n) check for each), which is infeasible for n = 50,000.
Optimal Approach
The key observation is that we only need the first and last occurrence of each character. If we know the leftmost and rightmost positions for each character, a substring is self-contained if, for every character in the substring, the substring completely contains all occurrences of that character.
Therefore, we can:
- Record the first and last index of each character.
- Consider substrings defined by any contiguous block between first and last occurrences of characters.
- Merge overlapping intervals of character occurrences.
- The largest interval that is strictly smaller than the full string gives the longest self-contained substring.
This reduces the problem to interval merging and scanning, achieving O(n) time complexity.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^3) | O(n) | Generate all substrings, check character uniqueness |
| Optimal | O(n) | O(1) | Track first and last occurrences, merge intervals |
Algorithm Walkthrough
- Initialize two arrays
first_occurrenceandlast_occurrenceof size 26 (for lowercase letters) with initial valuesinfand-inf. - Iterate over the string
sto fillfirst_occurrence[c]andlast_occurrence[c]for each characterc. - Extract intervals
(first_occurrence[c], last_occurrence[c])for all characters that occur ins. - Sort intervals by starting index.
- Merge overlapping intervals to find continuous blocks where each character's occurrences are fully contained.
- Track the length of each merged interval. Keep the maximum length that is less than the total length of
s. - If no valid interval exists, return
-1; otherwise, return the maximum length.
Why it works: By merging intervals of character occurrences, we ensure every character in a candidate substring appears only within that substring, fulfilling the self-contained property. Sorting and merging guarantee we handle overlapping characters correctly.
Python Solution
class Solution:
def maxSubstringLength(self, s: str) -> int:
n = len(s)
first_occurrence = [float('inf')] * 26
last_occurrence = [-1] * 26
for i, c in enumerate(s):
idx = ord(c) - ord('a')
first_occurrence[idx] = min(first_occurrence[idx], i)
last_occurrence[idx] = i
intervals = []
for i in range(26):
if last_occurrence[i] != -1:
intervals.append((first_occurrence[i], last_occurrence[i]))
intervals.sort()
merged_start, merged_end = intervals[0]
max_len = -1
for start, end in intervals[1:]:
if start <= merged_end:
merged_end = max(merged_end, end)
else:
if merged_end - merged_start + 1 < n:
max_len = max(max_len, merged_end - merged_start + 1)
merged_start, merged_end = start, end
if merged_end - merged_start + 1 < n:
max_len = max(max_len, merged_end - merged_start + 1)
return max_len if max_len > 0 else -1
Explanation: We track the first and last occurrence of each letter. Then we convert them into intervals and merge them. Only intervals smaller than n are considered, ensuring we do not select the entire string. The maximum valid interval length is returned.
Go Solution
func maxSubstringLength(s string) int {
n := len(s)
first := make([]int, 26)
last := make([]int, 26)
for i := 0; i < 26; i++ {
first[i] = 1 << 30
last[i] = -1
}
for i := 0; i < n; i++ {
idx := s[i] - 'a'
if i < first[idx] {
first[idx] = i
}
last[idx] = i
}
intervals := make([][2]int, 0)
for i := 0; i < 26; i++ {
if last[i] != -1 {
intervals = append(intervals, [2]int{first[i], last[i]})
}
}
// sort intervals by start
sort.Slice(intervals, func(i, j int) bool {
return intervals[i][0] < intervals[j][0]
})
mergedStart, mergedEnd := intervals[0][0], intervals[0][1]
maxLen := -1
for _, inter := range intervals[1:] {
start, end := inter[0], inter[1]
if start <= mergedEnd {
if end > mergedEnd {
mergedEnd = end
}
} else {
if mergedEnd-mergedStart+1 < n {
if mergedEnd-mergedStart+1 > maxLen {
maxLen = mergedEnd - mergedStart + 1
}
}
mergedStart, mergedEnd = start, end
}
}
if mergedEnd-mergedStart+1 < n && mergedEnd-mergedStart+1 > maxLen {
maxLen = mergedEnd - mergedStart + 1
}
return maxLen
}
Go-specific notes: We use slices and arrays for intervals and take care to initialize the first occurrence array with a large number (1 << 30) to emulate infinity. Sorting uses sort.Slice and indexing is zero-based.
Worked Examples
Example "abba"
| Step | Interval | Merged Interval | Max Length |
|---|---|---|---|
| 'a' | (0,0) | (0,0) | -1 |
| 'b' | (1,2) | (1,2) | 2 |
| Merge complete | (0,0),(1,2) | 2 |
Output: 2
Example "abab"
Intervals: 'a'(0,2), 'b'(1,3) → merge to (0,3). Length = 4 → full string, invalid → Output: -1.
Example "abacd"
Intervals: 'a'(0,2), 'b'(1,1), 'c'(3,3), 'd'(4,4) → merged blocks: (0,3),(4,4) → max valid length = 4 → Output: 4.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Single pass to record first/last, sort 26 intervals (O(1)), merge intervals O(1) |
| Space | O(1) | Arrays of size 26 and interval list of at most 26 elements |
Even though we sort intervals, there are only 26 letters, so sorting is constant time.
Test Cases
# Provided examples
assert Solution().maxSubstringLength("abba") == 2 # bb
assert Solution().maxSubstringLength("abab") == -1
assert Solution().maxSubstringLength("abacd") == 4 # abac
# Edge cases
assert Solution().maxSubstringLength("aa") == 1 # a
assert Solution().maxSubstringLength("a") == -1 # too short
assert Solution().maxSubstringLength("abcdef") == 5 # entire except last letter
assert Solution().maxSubstringLength("aabbcc") == 2 # pick any repeated block