LeetCode 2982 - Find Longest Special Substring That Occurs Thrice II
The problem asks us to find the maximum length of a special substring that appears at least three times in the given string. A substring is considered special if it consists entirely of a single repeated character.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Binary Search, Sliding Window, Counting
Solution
LeetCode 2982 - Find Longest Special Substring That Occurs Thrice II
Problem Understanding
The problem asks us to find the maximum length of a special substring that appears at least three times in the given string.
A substring is considered special if it consists entirely of a single repeated character. Examples include:
"a""bbb""zzzz"
Examples that are not special:
"ab""aba""abc"
The input is a string s consisting only of lowercase English letters. We must determine the largest integer L such that there exists a character c where the substring c * L appears at least three times as a substring of s.
An important detail is that occurrences may overlap. For example, in "aaaa":
"aa"appears at positions(0,1),(1,2), and(2,3)- Therefore
"aa"occurs three times
The constraints are very large:
3 <= s.length <= 5 * 10^5
A string of length 500,000 immediately rules out any solution that explicitly enumerates all substrings. Since there are O(n²) substrings, brute force is infeasible.
A useful observation is that every special substring is completely determined by:
- its character
- its length
Another important observation is that special substrings can only come from contiguous runs of identical characters.
For example:
aaabbccccaa
contains runs:
aaa
bb
cccc
aa
Any special substring must be contained within one of these runs.
Important edge cases include:
- Strings with no repeated characters, such as
"abcdef" - Strings consisting entirely of one character, such as
"aaaaaa" - Multiple runs of the same character
- Situations where occurrences come from several different runs
- Cases where the answer is exactly
1
The problem guarantees that the string contains only lowercase letters, which means there are only 26 possible characters.
Approaches
Brute Force
A brute-force solution would generate every possible special substring and count how many times it appears.
One way would be:
- Generate all substrings.
- Check whether each substring is special.
- Count its occurrences in the string.
- Track the maximum length whose count is at least three.
This approach is correct because it explicitly examines every candidate and verifies the condition.
However, there are O(n²) substrings. Even storing all of them is impossible for n = 5 * 10^5.
Therefore this approach is far too slow.
Key Insight
Instead of looking at individual substrings, look at runs of equal characters.
Suppose a run has length r.
For a candidate length L:
aaaaa (r = 5)
The substring "aaa" (L = 3) appears:
5 - 3 + 1 = 3
times inside this run.
More generally, a run of length r contributes:
r - L + 1
occurrences whenever r >= L.
Since there are only 26 characters, we can group run lengths by character.
Now the question becomes:
Does there exist a character whose total contribution for length L is at least 3?
This condition is monotonic:
- If length
Lworks, - then every smaller length also works.
Therefore we can binary search the answer.
For a fixed length L, we compute:
sum(r - L + 1)
over all runs of the same character with r >= L.
If any character reaches at least three occurrences, then L is feasible.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) to O(n³) | O(n²) | Enumerates substrings and counts occurrences |
| Optimal | O(n log n) | O(n) | Run-length encoding plus binary search |
Algorithm Walkthrough
Step 1
Traverse the string and perform run-length encoding.
For example:
aaabbccccaa
becomes:
a -> [3, 2]
b -> [2]
c -> [4]
We store, for each character, all run lengths belonging to that character.
Step 2
Define a feasibility function can(L).
This function answers:
Is there a special substring of length L that occurs at least three times?
Step 3
For each character independently, examine all of its runs.
If a run has length r >= L, it contributes:
r - L + 1
occurrences.
Add these contributions together.
Step 4
If the accumulated count for some character reaches three, immediately return True.
This means the substring consisting of that character repeated L times appears at least three times.
Step 5
Otherwise continue checking all characters.
If none reaches three occurrences, return False.
Step 6
Binary search on the answer.
Search range:
1 ... n
For a midpoint mid:
- If
can(mid)is true, store it as a candidate answer and search larger lengths. - Otherwise search smaller lengths.
Step 7
After the binary search finishes:
- Return the largest feasible length.
- Return
-1if no feasible length exists.
Why it works
Every special substring is entirely contained within a run of identical characters. A run of length r contains exactly r - L + 1 occurrences of a special substring of length L. Summing these contributions across all runs of the same character gives the exact number of occurrences of that substring. Since feasibility for length L implies feasibility for all smaller lengths, binary search correctly finds the maximum valid length.
Python Solution
class Solution:
def maximumLength(self, s: str) -> int:
n = len(s)
runs = [[] for _ in range(26)]
i = 0
while i < n:
j = i
while j < n and s[j] == s[i]:
j += 1
runs[ord(s[i]) - ord('a')].append(j - i)
i = j
def can(length: int) -> bool:
for char_runs in runs:
count = 0
for run_len in char_runs:
if run_len >= length:
count += run_len - length + 1
if count >= 3:
return True
return False
left, right = 1, n
answer = -1
while left <= right:
mid = (left + right) // 2
if can(mid):
answer = mid
left = mid + 1
else:
right = mid - 1
return answer
The implementation begins by constructing run-length information for each character. Instead of storing every occurrence of every substring, it stores only run lengths, which compresses the information needed for counting.
The helper function can(length) checks whether some character has at least three occurrences of a special substring of the specified length. Each run contributes run_len - length + 1 occurrences whenever the run is long enough.
The binary search exploits the monotonic property of the problem. If a length works, all smaller lengths also work. Therefore the search efficiently finds the largest feasible length.
Go Solution
func maximumLength(s string) int {
n := len(s)
runs := make([][]int, 26)
for i := 0; i < n; {
j := i
for j < n && s[j] == s[i] {
j++
}
runs[s[i]-'a'] = append(runs[s[i]-'a'], j-i)
i = j
}
can := func(length int) bool {
for _, charRuns := range runs {
count := 0
for _, runLen := range charRuns {
if runLen >= length {
count += runLen - length + 1
}
if count >= 3 {
return true
}
}
}
return false
}
left, right := 1, n
answer := -1
for left <= right {
mid := left + (right-left)/2
if can(mid) {
answer = mid
left = mid + 1
} else {
right = mid - 1
}
}
return answer
}
The Go implementation follows the same logic as the Python version. Slices are used to store run lengths for each character. Integer arithmetic is safe because the maximum count cannot exceed the string length, which is only 5 * 10^5.
Worked Examples
Example 1
s = "aaaa"
Run decomposition:
| Character | Runs |
|---|---|
| a | [4] |
Binary search checks:
| Length | Occurrences |
|---|---|
| 2 | 4 - 2 + 1 = 3 |
| 3 | 4 - 3 + 1 = 2 |
Length 2 works.
Length 3 does not.
Answer:
2
Example 2
s = "abcdef"
Run decomposition:
| Character | Runs |
|---|---|
| a | [1] |
| b | [1] |
| c | [1] |
| d | [1] |
| e | [1] |
| f | [1] |
For length 1:
Every character contributes only one occurrence.
No character reaches three occurrences.
Answer:
-1
Example 3
s = "abcaba"
Run decomposition:
| Character | Runs |
|---|---|
| a | [1, 1, 1] |
| b | [1, 1] |
| c | [1] |
Checking length 1:
For character a:
| Run Length | Contribution |
|---|---|
| 1 | 1 |
| 1 | 1 |
| 1 | 1 |
Total:
3
Therefore length 1 is valid.
No larger length exists.
Answer:
1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Binary search performs O(log n) checks, each check scans all run lengths |
| Space | O(n) | Stores all run lengths |
The run-length encoding contains at most n entries in total. Each binary search iteration scans all stored runs once. Since there are O(log n) iterations, the overall complexity is O(n log n).
Test Cases
sol = Solution()
assert sol.maximumLength("aaaa") == 2 # provided example
assert sol.maximumLength("abcdef") == -1 # no repeated character
assert sol.maximumLength("abcaba") == 1 # answer length 1
assert sol.maximumLength("aaa") == 1 # exactly three single-char occurrences
assert sol.maximumLength("aaaaa") == 3 # "aaa" appears three times
assert sol.maximumLength("aaaaaa") == 4 # "aaaa" appears three times
assert sol.maximumLength("aaabaa") == 2 # contributions from multiple runs
assert sol.maximumLength("aabaaaa") == 2 # split runs of same character
assert sol.maximumLength("bbbbbbbb") == 6 # long single run
assert sol.maximumLength("abababab") == 1 # many isolated occurrences
assert sol.maximumLength("zzzyyyzzz") == 2 # multiple runs same character
assert sol.maximumLength("aabbbcccc") == 2 # best answer from c-run
assert sol.maximumLength("abcdefffffgh") == 3 # single long run creates answer
assert sol.maximumLength("aaabaaa") == 2 # occurrences from separate runs
assert sol.maximumLength("xyz") == -1 # minimum distinct case
Test Summary
| Test | Why |
|---|---|
"aaaa" |
Provided example |
"abcdef" |
No valid substring |
"abcaba" |
Answer equals 1 |
"aaa" |
Smallest valid occurrence count |
"aaaaa" |
Single run contributes three occurrences |
"aaaaaa" |
Larger answer from one run |
"aaabaa" |
Multiple runs combine counts |
"aabaaaa" |
Uneven runs of same character |
"bbbbbbbb" |
Long single-character string |
"abababab" |
Only length 1 works |
"zzzyyyzzz" |
Multiple runs of same letter |
"aabbbcccc" |
Longest run determines answer |
"abcdefffffgh" |
One dominant run |
"aaabaaa" |
Contributions from separate runs |
"xyz" |
Small boundary case |
Edge Cases
No Character Appears Three Times
Consider:
"abcdef"
Every character occurs exactly once. Even the length-1 special substrings fail to reach three occurrences. A careless implementation might incorrectly count occurrences across different characters, but our solution tracks each character independently, correctly returning -1.
One Very Long Run
Consider:
"aaaaaaaa"
All occurrences come from a single run. The answer is not the run length itself because the substring must occur at least three times. The formula r - L + 1 correctly computes overlapping occurrences and identifies the largest feasible length.
Contributions Across Multiple Runs
Consider:
"aaabaaa"
The substring "aa" appears both in the first run and the second run. A solution that only looks at the longest run would miss valid occurrences contributed by other runs. Our algorithm sums contributions from every run belonging to the same character, ensuring all occurrences are counted correctly.
Length One as the Answer
Consider:
"abcaba"
No run has length greater than one, yet character 'a' appears three times. The algorithm naturally handles this because runs of length one contribute one occurrence when checking L = 1, producing the correct answer of 1.