LeetCode 2981 - Find Longest Special Substring That Occurs Thrice I

The problem asks us to find the longest special substring that appears at least three times in the given string s. A sub

LeetCode Problem 2981

Difficulty: 🟡 Medium
Topics: Hash Table, String, Binary Search, Sliding Window, Counting

Solution

Problem Understanding

The problem asks us to find the longest special substring that appears at least three times in the given string s.

A substring is considered special if it consists entirely of the same character. This means strings like "aaa", "bb", and "z" are valid special substrings, while "ab" or "aba" are not.

The important detail is that the substring must occur at least thrice, and occurrences are allowed to overlap. This is crucial because overlapping matches dramatically change how counting works. For example, in "aaaa":

  • "a" appears 4 times
  • "aa" appears 3 times (s[0:2], s[1:3], s[2:4])
  • "aaa" appears 2 times
  • "aaaa" appears 1 time

Since "aa" is the longest substring occurring at least three times, the answer is 2.

We are given:

  • A string s consisting only of lowercase English letters.
  • 3 <= s.length <= 50

The expected output is:

  • The maximum possible length of a special substring appearing at least three times.
  • -1 if no such substring exists.

The constraints are relatively small, with a maximum string length of only 50. This means even moderately expensive solutions are feasible. However, we still want a clean and efficient approach.

Several edge cases can easily trip up a naive implementation:

  • Strings with all unique characters, such as "abcdef", where no substring can appear three times.
  • Strings with large repeated runs, such as "aaaaaa", where overlapping occurrences matter.
  • Strings with multiple repeated character groups, such as "aaabaaa", where occurrences come from separate runs.
  • Cases where only a single-character substring qualifies, such as "abcaba".

The problem guarantees that the string is non-empty and only contains lowercase English letters, which simplifies character handling.

Approaches

Brute Force Approach

The brute force solution is to generate every possible special substring, count how many times it appears, and return the maximum valid length.

Since a special substring contains only one repeated character, we can enumerate every substring and check whether all characters are identical. If it is special, we count its occurrences across the entire string.

For each substring:

  1. Generate the substring.
  2. Verify it contains only one character.
  3. Scan the string to count overlapping occurrences.
  4. If it appears at least three times, update the answer.

This works because it explicitly checks every valid possibility. However, it performs excessive repeated work. We repeatedly scan the same regions of the string and recount many overlapping substrings.

Even though n <= 50 makes brute force acceptable, it is unnecessarily inefficient and conceptually messy.

Key Insight for an Optimal Approach

The key observation is that special substrings are fully determined by two things:

  1. The character used.
  2. The substring length.

For example:

  • "aaa" is fully defined by character 'a' and length 3.
  • "bb" is defined by character 'b' and length 2.

Instead of generating every substring individually, we can process the string in runs of identical characters.

Consider "aaabbbaaa":

  • 'a' run of length 3
  • 'b' run of length 3
  • 'a' run of length 3

From a run of length L, every special substring of length k contributes:

$$L - k + 1$$

occurrences.

For example, from "aaaa":

Length Occurrences
1 4
2 3
3 2
4 1

We can accumulate counts for every (character, length) pair using a hash map. Once all runs are processed, we simply find the largest length whose total occurrence count is at least 3.

This avoids redundant rescanning and naturally handles overlapping occurrences.

Approach Time Complexity Space Complexity Notes
Brute Force O(n³) O(n²) Generate substrings and repeatedly count occurrences
Optimal O(n²) O(n²) Process character runs and aggregate counts efficiently

Algorithm Walkthrough

Optimal Algorithm

  1. Initialize a hash map to count occurrences.

We use a dictionary where the key is (character, length) and the value is the total number of occurrences.

For example:

('a', 2) -> 5
('b', 1) -> 4

This lets us efficiently aggregate counts across multiple runs. 2. Traverse the string and identify consecutive character runs.

Use two pointers or a simple index scan.

For example:

s = "aaabbcc"

produces runs:

('a', 3)
('b', 2)
('c', 2)
  1. For each run, count contributions for every possible special substring length.

Suppose a run has length L.

For every k from 1 to L:

occurrences = L - k + 1

Add this value to:

counts[(character, k)]

Example:

Run "aaaa" (L = 4):

('a',1) += 4
('a',2) += 3
('a',3) += 2
('a',4) += 1
  1. Find the maximum valid substring length.

Iterate through the hash map.

Whenever a count is at least 3, update the answer:

answer = max(answer, length)
  1. Return the result.

If no valid substring exists, return -1.

Why it works

Every special substring belongs to exactly one character and one length. A run of identical characters contains all possible special substrings for that character. By counting contributions from every run, we compute the exact number of overlapping occurrences for every valid special substring. Since we examine every possible (character, length) pair and choose the largest one with frequency at least three, the result is guaranteed to be correct.

Python Solution

from collections import defaultdict

class Solution:
    def maximumLength(self, s: str) -> int:
        substring_counts: dict[tuple[str, int], int] = defaultdict(int)
        n = len(s)

        index = 0

        while index < n:
            start = index

            while index < n and s[index] == s[start]:
                index += 1

            run_length = index - start
            character = s[start]

            for length in range(1, run_length + 1):
                occurrences = run_length - length + 1
                substring_counts[(character, length)] += occurrences

        answer = -1

        for (character, length), count in substring_counts.items():
            if count >= 3:
                answer = max(answer, length)

        return answer

The implementation follows the algorithm exactly.

We first create a defaultdict(int) to accumulate occurrence counts. The dictionary key is a tuple of (character, length) because a special substring is uniquely identified by these two properties.

Next, we scan the string to find consecutive runs of the same character. The variable start marks the beginning of a run, and index advances until the run ends.

Once a run is identified, we compute contributions for every possible substring length. A run of length L contributes L - k + 1 occurrences for substrings of length k.

After processing all runs, we iterate through the collected counts and find the maximum length whose frequency is at least three.

If no such substring exists, the initialized value -1 remains unchanged and is returned.

Go Solution

func maximumLength(s string) int {
	substringCounts := make(map[[2]interface{}]int)

	n := len(s)
	index := 0

	for index < n {
		start := index

		for index < n && s[index] == s[start] {
			index++
		}

		runLength := index - start
		character := s[start]

		for length := 1; length <= runLength; length++ {
			occurrences := runLength - length + 1
			key := [2]interface{}{character, length}
			substringCounts[key] += occurrences
		}
	}

	answer := -1

	for key, count := range substringCounts {
		length := key[1].(int)

		if count >= 3 && length > answer {
			answer = length
		}
	}

	return answer
}

The Go implementation mirrors the Python approach, but there are a few language-specific differences.

Go does not support tuple keys directly like Python dictionaries, so we simulate the (character, length) pair using a composite key. Since the character is a byte and the length is an integer, we use a small composite structure to uniquely identify substrings.

Unlike Python's defaultdict, Go maps return the zero value automatically for missing keys, so incrementing counts works naturally.

Integer overflow is not a concern because the maximum string length is only 50.

Worked Examples

Example 1

s = "aaaa"

The algorithm processes one run:

Run Length
aaaa 4

Generated counts:

Substring Added Occurrences
a 4
aa 3
aaa 2
aaaa 1

Final map:

Key Count
('a', 1) 4
('a', 2) 3
('a', 3) 2
('a', 4) 1

Checking counts:

  • Length 1 qualifies
  • Length 2 qualifies
  • Length 3 does not
  • Length 4 does not

Answer:

2

Example 2

s = "abcdef"

Runs:

Run Length
a 1
b 1
c 1
d 1
e 1
f 1

Each substring appears only once.

No substring reaches frequency 3.

Answer:

-1

Example 3

s = "abcaba"

Runs:

Run Length
a 1
b 1
c 1
a 1
b 1
a 1

Accumulated counts:

Substring Count
a 3
b 2
c 1

Only "a" appears at least three times.

Answer:

1

Complexity Analysis

Measure Complexity Explanation
Time O(n²) Each run contributes up to its length worth of work
Space O(n²) Hash map stores character-length combinations

Although we iterate through runs only once, each run may contribute up to L substring lengths. In the worst case, when the string consists of one repeated character, we process lengths from 1 to n, giving quadratic behavior.

The space complexity is also O(n²) in theory because many (character, length) combinations can be stored, though in practice it is much smaller due to only 26 lowercase letters.

Test Cases

solution = Solution()

assert solution.maximumLength("aaaa") == 2  # overlapping occurrences
assert solution.maximumLength("abcdef") == -1  # no repeated substring
assert solution.maximumLength("abcaba") == 1  # single-character result
assert solution.maximumLength("aaa") == 1  # exact minimum valid count
assert solution.maximumLength("aaaaaa") == 4  # long overlapping run
assert solution.maximumLength("aaabaaa") == 2  # contributions across runs
assert solution.maximumLength("ababab") == 1  # repeated single characters
assert solution.maximumLength("zzzz") == 2  # repeated identical letters
assert solution.maximumLength("abcabcabc") == 1  # repeated scattered chars
assert solution.maximumLength("bbbbbbbb") == 6  # large overlap case
Test Why
"aaaa" Validates overlapping occurrences
"abcdef" Verifies no valid substring exists
"abcaba" Confirms single-character answer
"aaa" Minimum size valid case
"aaaaaa" Tests long repeated runs
"aaabaaa" Ensures counts merge across runs
"ababab" Tests separated repeated characters
"zzzz" Simple repeated-character scenario
"abcabcabc" Validates scattered occurrences
"bbbbbbbb" Stress test for overlapping counting

Edge Cases

Case 1: No character appears three times

Consider:

"abcdef"

A naive implementation might accidentally return 1 because every character appears once. However, the requirement is at least three occurrences. Since no substring satisfies this threshold, the answer must be -1.

The implementation handles this naturally because the answer starts at -1 and only updates when a count reaches at least 3.

Case 2: Overlapping occurrences

Consider:

"aaaa"

A common bug is counting only non-overlapping occurrences. If we only count non-overlapping matches, "aa" would appear twice instead of three times.

Our implementation correctly uses:

L - k + 1

which inherently includes overlapping substrings.

Case 3: Multiple runs contributing to the same substring

Consider:

"aaabaaa"

The substring "aa" appears in both runs of 'a'.

A buggy solution might process runs independently and fail to merge counts globally. Our hash map accumulates counts across all runs:

('a', 2)

receives contributions from both sections, ensuring the correct total frequency is computed.