LeetCode 3104 - Find Longest Self-Contained Substring

The problem asks us to find the longest self-contained substring in a given string s. A substring is self-contained if it does not share any characters with the rest of the string.

LeetCode Problem 3104

Difficulty: 🔴 Hard
Topics: Hash Table, String, Sorting

Solution

Problem Understanding

The problem asks us to find the longest self-contained substring in a given string s. A substring is self-contained if it does not share any characters with the rest of the string. More formally, for a substring t of s, every character in t must appear only inside t and not elsewhere in s. Additionally, the substring cannot be the entire string itself. If no such substring exists, we return -1.

The input is a string s with length up to 50,000, consisting of only lowercase English letters. The output is a single integer: the length of the longest self-contained substring.

Important edge cases include strings where every character repeats multiple times, strings with all unique characters, and strings where the self-contained substring is at the edges.

Examples to clarify:

  1. "abba""bb" is valid because the 'b' characters only exist in the substring. Output: 2.
  2. "abab" → no substring is self-contained because every character occurs outside any candidate substring. Output: -1.
  3. "abacd""abac" is self-contained because 'd' is outside and doesn't occur inside. Output: 4.

Approaches

Brute Force Approach

A brute force approach would generate all possible substrings of s except the whole string. For each substring, we would check if every character inside the substring exists only inside it and nowhere else in s. This requires iterating through all substrings, and for each substring, checking character occurrences in the rest of the string. This results in O(n^3) time complexity (O(n^2) substrings and O(n) check for each), which is infeasible for n = 50,000.

Optimal Approach

The key observation is that we only need the first and last occurrence of each character. If we know the leftmost and rightmost positions for each character, a substring is self-contained if, for every character in the substring, the substring completely contains all occurrences of that character.

Therefore, we can:

  1. Record the first and last index of each character.
  2. Consider substrings defined by any contiguous block between first and last occurrences of characters.
  3. Merge overlapping intervals of character occurrences.
  4. The largest interval that is strictly smaller than the full string gives the longest self-contained substring.

This reduces the problem to interval merging and scanning, achieving O(n) time complexity.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n) Generate all substrings, check character uniqueness
Optimal O(n) O(1) Track first and last occurrences, merge intervals

Algorithm Walkthrough

  1. Initialize two arrays first_occurrence and last_occurrence of size 26 (for lowercase letters) with initial values inf and -inf.
  2. Iterate over the string s to fill first_occurrence[c] and last_occurrence[c] for each character c.
  3. Extract intervals (first_occurrence[c], last_occurrence[c]) for all characters that occur in s.
  4. Sort intervals by starting index.
  5. Merge overlapping intervals to find continuous blocks where each character's occurrences are fully contained.
  6. Track the length of each merged interval. Keep the maximum length that is less than the total length of s.
  7. If no valid interval exists, return -1; otherwise, return the maximum length.

Why it works: By merging intervals of character occurrences, we ensure every character in a candidate substring appears only within that substring, fulfilling the self-contained property. Sorting and merging guarantee we handle overlapping characters correctly.

Python Solution

class Solution:
    def maxSubstringLength(self, s: str) -> int:
        n = len(s)
        first_occurrence = [float('inf')] * 26
        last_occurrence = [-1] * 26
        
        for i, c in enumerate(s):
            idx = ord(c) - ord('a')
            first_occurrence[idx] = min(first_occurrence[idx], i)
            last_occurrence[idx] = i
        
        intervals = []
        for i in range(26):
            if last_occurrence[i] != -1:
                intervals.append((first_occurrence[i], last_occurrence[i]))
        
        intervals.sort()
        merged_start, merged_end = intervals[0]
        max_len = -1
        
        for start, end in intervals[1:]:
            if start <= merged_end:
                merged_end = max(merged_end, end)
            else:
                if merged_end - merged_start + 1 < n:
                    max_len = max(max_len, merged_end - merged_start + 1)
                merged_start, merged_end = start, end
        
        if merged_end - merged_start + 1 < n:
            max_len = max(max_len, merged_end - merged_start + 1)
        
        return max_len if max_len > 0 else -1

Explanation: We track the first and last occurrence of each letter. Then we convert them into intervals and merge them. Only intervals smaller than n are considered, ensuring we do not select the entire string. The maximum valid interval length is returned.

Go Solution

func maxSubstringLength(s string) int {
    n := len(s)
    first := make([]int, 26)
    last := make([]int, 26)
    for i := 0; i < 26; i++ {
        first[i] = 1 << 30
        last[i] = -1
    }
    
    for i := 0; i < n; i++ {
        idx := s[i] - 'a'
        if i < first[idx] {
            first[idx] = i
        }
        last[idx] = i
    }
    
    intervals := make([][2]int, 0)
    for i := 0; i < 26; i++ {
        if last[i] != -1 {
            intervals = append(intervals, [2]int{first[i], last[i]})
        }
    }
    
    // sort intervals by start
    sort.Slice(intervals, func(i, j int) bool {
        return intervals[i][0] < intervals[j][0]
    })
    
    mergedStart, mergedEnd := intervals[0][0], intervals[0][1]
    maxLen := -1
    
    for _, inter := range intervals[1:] {
        start, end := inter[0], inter[1]
        if start <= mergedEnd {
            if end > mergedEnd {
                mergedEnd = end
            }
        } else {
            if mergedEnd-mergedStart+1 < n {
                if mergedEnd-mergedStart+1 > maxLen {
                    maxLen = mergedEnd - mergedStart + 1
                }
            }
            mergedStart, mergedEnd = start, end
        }
    }
    
    if mergedEnd-mergedStart+1 < n && mergedEnd-mergedStart+1 > maxLen {
        maxLen = mergedEnd - mergedStart + 1
    }
    
    return maxLen
}

Go-specific notes: We use slices and arrays for intervals and take care to initialize the first occurrence array with a large number (1 << 30) to emulate infinity. Sorting uses sort.Slice and indexing is zero-based.

Worked Examples

Example "abba"

Step Interval Merged Interval Max Length
'a' (0,0) (0,0) -1
'b' (1,2) (1,2) 2
Merge complete (0,0),(1,2) 2

Output: 2

Example "abab"

Intervals: 'a'(0,2), 'b'(1,3) → merge to (0,3). Length = 4 → full string, invalid → Output: -1.

Example "abacd"

Intervals: 'a'(0,2), 'b'(1,1), 'c'(3,3), 'd'(4,4) → merged blocks: (0,3),(4,4) → max valid length = 4 → Output: 4.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Single pass to record first/last, sort 26 intervals (O(1)), merge intervals O(1)
Space O(1) Arrays of size 26 and interval list of at most 26 elements

Even though we sort intervals, there are only 26 letters, so sorting is constant time.

Test Cases

# Provided examples
assert Solution().maxSubstringLength("abba") == 2  # bb
assert Solution().maxSubstringLength("abab") == -1
assert Solution().maxSubstringLength("abacd") == 4  # abac

# Edge cases
assert Solution().maxSubstringLength("aa") == 1  # a
assert Solution().maxSubstringLength("a") == -1  # too short
assert Solution().maxSubstringLength("abcdef") == 5  # entire except last letter
assert Solution().maxSubstringLength("aabbcc") == 2  # pick any repeated block