LeetCode 2981 - Find Longest Special Substring That Occurs Thrice I
The problem asks us to find the longest special substring that appears at least three times in the given string s. A sub
Difficulty: 🟡 Medium
Topics: Hash Table, String, Binary Search, Sliding Window, Counting
Solution
Problem Understanding
The problem asks us to find the longest special substring that appears at least three times in the given string s.
A substring is considered special if it consists entirely of the same character. This means strings like "aaa", "bb", and "z" are valid special substrings, while "ab" or "aba" are not.
The important detail is that the substring must occur at least thrice, and occurrences are allowed to overlap. This is crucial because overlapping matches dramatically change how counting works. For example, in "aaaa":
"a"appears 4 times"aa"appears 3 times (s[0:2],s[1:3],s[2:4])"aaa"appears 2 times"aaaa"appears 1 time
Since "aa" is the longest substring occurring at least three times, the answer is 2.
We are given:
- A string
sconsisting only of lowercase English letters. 3 <= s.length <= 50
The expected output is:
- The maximum possible length of a special substring appearing at least three times.
-1if no such substring exists.
The constraints are relatively small, with a maximum string length of only 50. This means even moderately expensive solutions are feasible. However, we still want a clean and efficient approach.
Several edge cases can easily trip up a naive implementation:
- Strings with all unique characters, such as
"abcdef", where no substring can appear three times. - Strings with large repeated runs, such as
"aaaaaa", where overlapping occurrences matter. - Strings with multiple repeated character groups, such as
"aaabaaa", where occurrences come from separate runs. - Cases where only a single-character substring qualifies, such as
"abcaba".
The problem guarantees that the string is non-empty and only contains lowercase English letters, which simplifies character handling.
Approaches
Brute Force Approach
The brute force solution is to generate every possible special substring, count how many times it appears, and return the maximum valid length.
Since a special substring contains only one repeated character, we can enumerate every substring and check whether all characters are identical. If it is special, we count its occurrences across the entire string.
For each substring:
- Generate the substring.
- Verify it contains only one character.
- Scan the string to count overlapping occurrences.
- If it appears at least three times, update the answer.
This works because it explicitly checks every valid possibility. However, it performs excessive repeated work. We repeatedly scan the same regions of the string and recount many overlapping substrings.
Even though n <= 50 makes brute force acceptable, it is unnecessarily inefficient and conceptually messy.
Key Insight for an Optimal Approach
The key observation is that special substrings are fully determined by two things:
- The character used.
- The substring length.
For example:
"aaa"is fully defined by character'a'and length3."bb"is defined by character'b'and length2.
Instead of generating every substring individually, we can process the string in runs of identical characters.
Consider "aaabbbaaa":
'a'run of length3'b'run of length3'a'run of length3
From a run of length L, every special substring of length k contributes:
$$L - k + 1$$
occurrences.
For example, from "aaaa":
| Length | Occurrences |
|---|---|
| 1 | 4 |
| 2 | 3 |
| 3 | 2 |
| 4 | 1 |
We can accumulate counts for every (character, length) pair using a hash map. Once all runs are processed, we simply find the largest length whose total occurrence count is at least 3.
This avoids redundant rescanning and naturally handles overlapping occurrences.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(n²) | Generate substrings and repeatedly count occurrences |
| Optimal | O(n²) | O(n²) | Process character runs and aggregate counts efficiently |
Algorithm Walkthrough
Optimal Algorithm
- Initialize a hash map to count occurrences.
We use a dictionary where the key is (character, length) and the value is the total number of occurrences.
For example:
('a', 2) -> 5
('b', 1) -> 4
This lets us efficiently aggregate counts across multiple runs. 2. Traverse the string and identify consecutive character runs.
Use two pointers or a simple index scan.
For example:
s = "aaabbcc"
produces runs:
('a', 3)
('b', 2)
('c', 2)
- For each run, count contributions for every possible special substring length.
Suppose a run has length L.
For every k from 1 to L:
occurrences = L - k + 1
Add this value to:
counts[(character, k)]
Example:
Run "aaaa" (L = 4):
('a',1) += 4
('a',2) += 3
('a',3) += 2
('a',4) += 1
- Find the maximum valid substring length.
Iterate through the hash map.
Whenever a count is at least 3, update the answer:
answer = max(answer, length)
- Return the result.
If no valid substring exists, return -1.
Why it works
Every special substring belongs to exactly one character and one length. A run of identical characters contains all possible special substrings for that character. By counting contributions from every run, we compute the exact number of overlapping occurrences for every valid special substring. Since we examine every possible (character, length) pair and choose the largest one with frequency at least three, the result is guaranteed to be correct.
Python Solution
from collections import defaultdict
class Solution:
def maximumLength(self, s: str) -> int:
substring_counts: dict[tuple[str, int], int] = defaultdict(int)
n = len(s)
index = 0
while index < n:
start = index
while index < n and s[index] == s[start]:
index += 1
run_length = index - start
character = s[start]
for length in range(1, run_length + 1):
occurrences = run_length - length + 1
substring_counts[(character, length)] += occurrences
answer = -1
for (character, length), count in substring_counts.items():
if count >= 3:
answer = max(answer, length)
return answer
The implementation follows the algorithm exactly.
We first create a defaultdict(int) to accumulate occurrence counts. The dictionary key is a tuple of (character, length) because a special substring is uniquely identified by these two properties.
Next, we scan the string to find consecutive runs of the same character. The variable start marks the beginning of a run, and index advances until the run ends.
Once a run is identified, we compute contributions for every possible substring length. A run of length L contributes L - k + 1 occurrences for substrings of length k.
After processing all runs, we iterate through the collected counts and find the maximum length whose frequency is at least three.
If no such substring exists, the initialized value -1 remains unchanged and is returned.
Go Solution
func maximumLength(s string) int {
substringCounts := make(map[[2]interface{}]int)
n := len(s)
index := 0
for index < n {
start := index
for index < n && s[index] == s[start] {
index++
}
runLength := index - start
character := s[start]
for length := 1; length <= runLength; length++ {
occurrences := runLength - length + 1
key := [2]interface{}{character, length}
substringCounts[key] += occurrences
}
}
answer := -1
for key, count := range substringCounts {
length := key[1].(int)
if count >= 3 && length > answer {
answer = length
}
}
return answer
}
The Go implementation mirrors the Python approach, but there are a few language-specific differences.
Go does not support tuple keys directly like Python dictionaries, so we simulate the (character, length) pair using a composite key. Since the character is a byte and the length is an integer, we use a small composite structure to uniquely identify substrings.
Unlike Python's defaultdict, Go maps return the zero value automatically for missing keys, so incrementing counts works naturally.
Integer overflow is not a concern because the maximum string length is only 50.
Worked Examples
Example 1
s = "aaaa"
The algorithm processes one run:
| Run | Length |
|---|---|
| aaaa | 4 |
Generated counts:
| Substring | Added Occurrences |
|---|---|
| a | 4 |
| aa | 3 |
| aaa | 2 |
| aaaa | 1 |
Final map:
| Key | Count |
|---|---|
| ('a', 1) | 4 |
| ('a', 2) | 3 |
| ('a', 3) | 2 |
| ('a', 4) | 1 |
Checking counts:
- Length
1qualifies - Length
2qualifies - Length
3does not - Length
4does not
Answer:
2
Example 2
s = "abcdef"
Runs:
| Run | Length |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
| d | 1 |
| e | 1 |
| f | 1 |
Each substring appears only once.
No substring reaches frequency 3.
Answer:
-1
Example 3
s = "abcaba"
Runs:
| Run | Length |
|---|---|
| a | 1 |
| b | 1 |
| c | 1 |
| a | 1 |
| b | 1 |
| a | 1 |
Accumulated counts:
| Substring | Count |
|---|---|
| a | 3 |
| b | 2 |
| c | 1 |
Only "a" appears at least three times.
Answer:
1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n²) | Each run contributes up to its length worth of work |
| Space | O(n²) | Hash map stores character-length combinations |
Although we iterate through runs only once, each run may contribute up to L substring lengths. In the worst case, when the string consists of one repeated character, we process lengths from 1 to n, giving quadratic behavior.
The space complexity is also O(n²) in theory because many (character, length) combinations can be stored, though in practice it is much smaller due to only 26 lowercase letters.
Test Cases
solution = Solution()
assert solution.maximumLength("aaaa") == 2 # overlapping occurrences
assert solution.maximumLength("abcdef") == -1 # no repeated substring
assert solution.maximumLength("abcaba") == 1 # single-character result
assert solution.maximumLength("aaa") == 1 # exact minimum valid count
assert solution.maximumLength("aaaaaa") == 4 # long overlapping run
assert solution.maximumLength("aaabaaa") == 2 # contributions across runs
assert solution.maximumLength("ababab") == 1 # repeated single characters
assert solution.maximumLength("zzzz") == 2 # repeated identical letters
assert solution.maximumLength("abcabcabc") == 1 # repeated scattered chars
assert solution.maximumLength("bbbbbbbb") == 6 # large overlap case
| Test | Why |
|---|---|
"aaaa" |
Validates overlapping occurrences |
"abcdef" |
Verifies no valid substring exists |
"abcaba" |
Confirms single-character answer |
"aaa" |
Minimum size valid case |
"aaaaaa" |
Tests long repeated runs |
"aaabaaa" |
Ensures counts merge across runs |
"ababab" |
Tests separated repeated characters |
"zzzz" |
Simple repeated-character scenario |
"abcabcabc" |
Validates scattered occurrences |
"bbbbbbbb" |
Stress test for overlapping counting |
Edge Cases
Case 1: No character appears three times
Consider:
"abcdef"
A naive implementation might accidentally return 1 because every character appears once. However, the requirement is at least three occurrences. Since no substring satisfies this threshold, the answer must be -1.
The implementation handles this naturally because the answer starts at -1 and only updates when a count reaches at least 3.
Case 2: Overlapping occurrences
Consider:
"aaaa"
A common bug is counting only non-overlapping occurrences. If we only count non-overlapping matches, "aa" would appear twice instead of three times.
Our implementation correctly uses:
L - k + 1
which inherently includes overlapping substrings.
Case 3: Multiple runs contributing to the same substring
Consider:
"aaabaaa"
The substring "aa" appears in both runs of 'a'.
A buggy solution might process runs independently and fail to merge counts globally. Our hash map accumulates counts across all runs:
('a', 2)
receives contributions from both sections, ensuring the correct total frequency is computed.