LeetCode 1624 - Largest Substring Between Two Equal Characters

The problem asks us to find the length of the longest substring that lies between two identical characters in a given string s. The substring does not include the two identical characters themselves. If no character occurs at least twice, we must return -1.

LeetCode Problem 1624

Difficulty: 🟢 Easy
Topics: Hash Table, String

Solution

Problem Understanding

The problem asks us to find the length of the longest substring that lies between two identical characters in a given string s. The substring does not include the two identical characters themselves. If no character occurs at least twice, we must return -1.

In other words, we are looking for pairs of identical characters in the string and measuring the number of characters strictly between them. We then return the maximum length found. The string only contains lowercase English letters, and its length ranges from 1 to 300, so we are dealing with a small-scale input suitable for simple hash-based approaches.

Important edge cases include strings with all unique characters, strings where the longest substring is empty (like "aa"), and strings with multiple repeating characters.

Approaches

Brute Force

A brute-force approach would check every possible pair of characters in the string to see if they are equal and compute the substring length between them. This involves nested loops: the outer loop selects the first character and the inner loop checks all subsequent characters for a match. Whenever a match is found, we calculate the substring length and keep track of the maximum.

While this guarantees correctness, it is inefficient. With a string of length n, this approach requires checking O(n^2) pairs, making it unnecessary to perform nested loops given the constraints allow a better solution.

Optimal Approach

The key observation is that for any character, only the first and last occurrence matter for calculating the maximum substring length between identical characters. If a character appears multiple times, the longest substring between occurrences is determined by the first and last index of that character.

Using a hash map (or dictionary) to store the first occurrence index of each character, we can iterate through the string once. When we encounter a character seen before, we compute the substring length from its first occurrence to the current index, subtracting one to exclude the characters themselves. We maintain a running maximum for the result.

This approach leverages the property that all characters are lowercase English letters, giving at most 26 unique characters, and the maximum substring is found by comparing indices rather than enumerating all pairs.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(1) Check all character pairs
Optimal O(n) O(26) ≈ O(1) Use hash map to track first occurrences

Algorithm Walkthrough

  1. Initialize an empty dictionary first_occurrence to store the first index each character appears at. Initialize max_length to -1 to handle cases with no repeated characters.
  2. Iterate through the string using index i and character c.
  3. If c is not in first_occurrence, store i as its first occurrence.
  4. If c is already in first_occurrence, compute the length of the substring between the first occurrence and i as i - first_occurrence[c] - 1.
  5. Update max_length if this length is greater than the current maximum.
  6. After iterating through the string, return max_length.

Why it works: The algorithm guarantees that we consider the longest possible substring for each character, because the first occurrence to the current index covers the maximum gap between identical characters. Tracking first occurrences ensures we do not recompute unnecessary pairs, reducing the complexity from quadratic to linear.

Python Solution

class Solution:
    def maxLengthBetweenEqualCharacters(self, s: str) -> int:
        first_occurrence = {}
        max_length = -1

        for i, c in enumerate(s):
            if c not in first_occurrence:
                first_occurrence[c] = i
            else:
                current_length = i - first_occurrence[c] - 1
                if current_length > max_length:
                    max_length = current_length

        return max_length

The implementation follows the algorithm exactly. The dictionary stores first indices of characters. For each subsequent occurrence of a character, the substring length is calculated and the maximum updated. Finally, we return max_length, which is -1 if no repeats were found.

Go Solution

func maxLengthBetweenEqualCharacters(s string) int {
    firstOccurrence := make(map[rune]int)
    maxLength := -1

    for i, c := range s {
        if _, exists := firstOccurrence[c]; !exists {
            firstOccurrence[c] = i
        } else {
            currentLength := i - firstOccurrence[c] - 1
            if currentLength > maxLength {
                maxLength = currentLength
            }
        }
    }

    return maxLength
}

In Go, we use a map[rune]int for first occurrences. The logic mirrors Python closely. Runes are used to handle characters, though all inputs are lowercase letters. The map provides O(1) access, and the rest of the code is straightforward arithmetic and comparison.

Worked Examples

Example 1: s = "aa"

Step i c first_occurrence current_length max_length
1 0 'a' {'a':0} - -1
2 1 'a' {'a':0} 1-0-1=0 0

Return 0.

Example 2: s = "abca"

Step i c first_occurrence current_length max_length
1 0 'a' {'a':0} - -1
2 1 'b' {'a':0, 'b':1} - -1
3 2 'c' {'a':0, 'b':1, 'c':2} - -1
4 3 'a' {'a':0, 'b':1, 'c':2} 3-0-1=2 2

Return 2.

Example 3: s = "cbzxy"

All characters unique, max_length remains -1. Return -1.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Iterate through string once, dictionary operations O(1)
Space O(1) Maximum 26 lowercase letters stored in the dictionary

The linear time complexity is due to single-pass iteration. The space is effectively constant because the number of unique lowercase letters is bounded.

Test Cases

# Provided examples
assert Solution().maxLengthBetweenEqualCharacters("aa") == 0  # empty substring
assert Solution().maxLengthBetweenEqualCharacters("abca") == 2  # "bc"
assert Solution().maxLengthBetweenEqualCharacters("cbzxy") == -1  # no repeats

# Boundary and edge cases
assert Solution().maxLengthBetweenEqualCharacters("a") == -1  # single character
assert Solution().maxLengthBetweenEqualCharacters("abcdeff") == 0  # repeat at end, empty substring
assert Solution().maxLengthBetweenEqualCharacters("abacabadabacaba") == 13  # longest 'a' at start/end
assert Solution().maxLengthBetweenEqualCharacters("zzzz") == 2  # longest between first and last 'z'
assert Solution().maxLengthBetweenEqualCharacters("abcdedcba") == 7  # 'a' at start and end
Test Why
"aa" Minimal repeat, empty substring
"abca" Typical substring between repeats
"cbzxy" No repeats, returns -1
"a" Single character edge case
"abcdeff" Repeat at end, empty substring
"abacabadabacaba" Longest substring between first and last occurrence of 'a'
"zzzz" Multiple repeats, check first/last handling
"abcdedcba" Substring spanning entire string

Edge Cases

The first edge case is a string with only one character. The correct output is -1 because there are no repeated characters. This could cause a naive approach to attempt indexing beyond bounds.

The second edge case is two identical characters next to each other, like "aa". The substring length is zero. The implementation correctly computes i - first_index - 1 = 0 without error.

The third edge case is multiple repeats of the same character, where the longest substring is determined by the first and last occurrence, not intermediate ones. For example, in "abacabadabacaba", the character 'a' repeats multiple times. Using a dictionary of first occurrences ensures we compute last_index - first_index - 1 for the maximum gap.

This approach handles all edge cases efficiently and correctly.