LeetCode 2982 - Find Longest Special Substring That Occurs Thrice II

The problem asks us to find the maximum length of a special substring that appears at least three times in the given string. A substring is considered special if it consists entirely of a single repeated character.

LeetCode Problem 2982

Difficulty: 🟡 Medium
Topics: Hash Table, String, Binary Search, Sliding Window, Counting

Solution

LeetCode 2982 - Find Longest Special Substring That Occurs Thrice II

Problem Understanding

The problem asks us to find the maximum length of a special substring that appears at least three times in the given string.

A substring is considered special if it consists entirely of a single repeated character. Examples include:

  • "a"
  • "bbb"
  • "zzzz"

Examples that are not special:

  • "ab"
  • "aba"
  • "abc"

The input is a string s consisting only of lowercase English letters. We must determine the largest integer L such that there exists a character c where the substring c * L appears at least three times as a substring of s.

An important detail is that occurrences may overlap. For example, in "aaaa":

  • "aa" appears at positions (0,1), (1,2), and (2,3)
  • Therefore "aa" occurs three times

The constraints are very large:

  • 3 <= s.length <= 5 * 10^5

A string of length 500,000 immediately rules out any solution that explicitly enumerates all substrings. Since there are O(n²) substrings, brute force is infeasible.

A useful observation is that every special substring is completely determined by:

  • its character
  • its length

Another important observation is that special substrings can only come from contiguous runs of identical characters.

For example:

aaabbccccaa

contains runs:

aaa
bb
cccc
aa

Any special substring must be contained within one of these runs.

Important edge cases include:

  • Strings with no repeated characters, such as "abcdef"
  • Strings consisting entirely of one character, such as "aaaaaa"
  • Multiple runs of the same character
  • Situations where occurrences come from several different runs
  • Cases where the answer is exactly 1

The problem guarantees that the string contains only lowercase letters, which means there are only 26 possible characters.

Approaches

Brute Force

A brute-force solution would generate every possible special substring and count how many times it appears.

One way would be:

  1. Generate all substrings.
  2. Check whether each substring is special.
  3. Count its occurrences in the string.
  4. Track the maximum length whose count is at least three.

This approach is correct because it explicitly examines every candidate and verifies the condition.

However, there are O(n²) substrings. Even storing all of them is impossible for n = 5 * 10^5.

Therefore this approach is far too slow.

Key Insight

Instead of looking at individual substrings, look at runs of equal characters.

Suppose a run has length r.

For a candidate length L:

aaaaa  (r = 5)

The substring "aaa" (L = 3) appears:

5 - 3 + 1 = 3

times inside this run.

More generally, a run of length r contributes:

r - L + 1

occurrences whenever r >= L.

Since there are only 26 characters, we can group run lengths by character.

Now the question becomes:

Does there exist a character whose total contribution for length L is at least 3?

This condition is monotonic:

  • If length L works,
  • then every smaller length also works.

Therefore we can binary search the answer.

For a fixed length L, we compute:

sum(r - L + 1)

over all runs of the same character with r >= L.

If any character reaches at least three occurrences, then L is feasible.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) to O(n³) O(n²) Enumerates substrings and counts occurrences
Optimal O(n log n) O(n) Run-length encoding plus binary search

Algorithm Walkthrough

Step 1

Traverse the string and perform run-length encoding.

For example:

aaabbccccaa

becomes:

a -> [3, 2]
b -> [2]
c -> [4]

We store, for each character, all run lengths belonging to that character.

Step 2

Define a feasibility function can(L).

This function answers:

Is there a special substring of length L that occurs at least three times?

Step 3

For each character independently, examine all of its runs.

If a run has length r >= L, it contributes:

r - L + 1

occurrences.

Add these contributions together.

Step 4

If the accumulated count for some character reaches three, immediately return True.

This means the substring consisting of that character repeated L times appears at least three times.

Step 5

Otherwise continue checking all characters.

If none reaches three occurrences, return False.

Step 6

Binary search on the answer.

Search range:

1 ... n

For a midpoint mid:

  • If can(mid) is true, store it as a candidate answer and search larger lengths.
  • Otherwise search smaller lengths.

Step 7

After the binary search finishes:

  • Return the largest feasible length.
  • Return -1 if no feasible length exists.

Why it works

Every special substring is entirely contained within a run of identical characters. A run of length r contains exactly r - L + 1 occurrences of a special substring of length L. Summing these contributions across all runs of the same character gives the exact number of occurrences of that substring. Since feasibility for length L implies feasibility for all smaller lengths, binary search correctly finds the maximum valid length.

Python Solution

class Solution:
    def maximumLength(self, s: str) -> int:
        n = len(s)

        runs = [[] for _ in range(26)]

        i = 0
        while i < n:
            j = i
            while j < n and s[j] == s[i]:
                j += 1

            runs[ord(s[i]) - ord('a')].append(j - i)
            i = j

        def can(length: int) -> bool:
            for char_runs in runs:
                count = 0

                for run_len in char_runs:
                    if run_len >= length:
                        count += run_len - length + 1

                    if count >= 3:
                        return True

            return False

        left, right = 1, n
        answer = -1

        while left <= right:
            mid = (left + right) // 2

            if can(mid):
                answer = mid
                left = mid + 1
            else:
                right = mid - 1

        return answer

The implementation begins by constructing run-length information for each character. Instead of storing every occurrence of every substring, it stores only run lengths, which compresses the information needed for counting.

The helper function can(length) checks whether some character has at least three occurrences of a special substring of the specified length. Each run contributes run_len - length + 1 occurrences whenever the run is long enough.

The binary search exploits the monotonic property of the problem. If a length works, all smaller lengths also work. Therefore the search efficiently finds the largest feasible length.

Go Solution

func maximumLength(s string) int {
	n := len(s)

	runs := make([][]int, 26)

	for i := 0; i < n; {
		j := i
		for j < n && s[j] == s[i] {
			j++
		}

		runs[s[i]-'a'] = append(runs[s[i]-'a'], j-i)
		i = j
	}

	can := func(length int) bool {
		for _, charRuns := range runs {
			count := 0

			for _, runLen := range charRuns {
				if runLen >= length {
					count += runLen - length + 1
				}

				if count >= 3 {
					return true
				}
			}
		}

		return false
	}

	left, right := 1, n
	answer := -1

	for left <= right {
		mid := left + (right-left)/2

		if can(mid) {
			answer = mid
			left = mid + 1
		} else {
			right = mid - 1
		}
	}

	return answer
}

The Go implementation follows the same logic as the Python version. Slices are used to store run lengths for each character. Integer arithmetic is safe because the maximum count cannot exceed the string length, which is only 5 * 10^5.

Worked Examples

Example 1

s = "aaaa"

Run decomposition:

Character Runs
a [4]

Binary search checks:

Length Occurrences
2 4 - 2 + 1 = 3
3 4 - 3 + 1 = 2

Length 2 works.

Length 3 does not.

Answer:

2

Example 2

s = "abcdef"

Run decomposition:

Character Runs
a [1]
b [1]
c [1]
d [1]
e [1]
f [1]

For length 1:

Every character contributes only one occurrence.

No character reaches three occurrences.

Answer:

-1

Example 3

s = "abcaba"

Run decomposition:

Character Runs
a [1, 1, 1]
b [1, 1]
c [1]

Checking length 1:

For character a:

Run Length Contribution
1 1
1 1
1 1

Total:

3

Therefore length 1 is valid.

No larger length exists.

Answer:

1

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Binary search performs O(log n) checks, each check scans all run lengths
Space O(n) Stores all run lengths

The run-length encoding contains at most n entries in total. Each binary search iteration scans all stored runs once. Since there are O(log n) iterations, the overall complexity is O(n log n).

Test Cases

sol = Solution()

assert sol.maximumLength("aaaa") == 2          # provided example
assert sol.maximumLength("abcdef") == -1       # no repeated character
assert sol.maximumLength("abcaba") == 1        # answer length 1

assert sol.maximumLength("aaa") == 1           # exactly three single-char occurrences
assert sol.maximumLength("aaaaa") == 3         # "aaa" appears three times
assert sol.maximumLength("aaaaaa") == 4        # "aaaa" appears three times

assert sol.maximumLength("aaabaa") == 2        # contributions from multiple runs
assert sol.maximumLength("aabaaaa") == 2       # split runs of same character

assert sol.maximumLength("bbbbbbbb") == 6      # long single run
assert sol.maximumLength("abababab") == 1      # many isolated occurrences

assert sol.maximumLength("zzzyyyzzz") == 2     # multiple runs same character
assert sol.maximumLength("aabbbcccc") == 2     # best answer from c-run

assert sol.maximumLength("abcdefffffgh") == 3  # single long run creates answer
assert sol.maximumLength("aaabaaa") == 2       # occurrences from separate runs

assert sol.maximumLength("xyz") == -1          # minimum distinct case

Test Summary

Test Why
"aaaa" Provided example
"abcdef" No valid substring
"abcaba" Answer equals 1
"aaa" Smallest valid occurrence count
"aaaaa" Single run contributes three occurrences
"aaaaaa" Larger answer from one run
"aaabaa" Multiple runs combine counts
"aabaaaa" Uneven runs of same character
"bbbbbbbb" Long single-character string
"abababab" Only length 1 works
"zzzyyyzzz" Multiple runs of same letter
"aabbbcccc" Longest run determines answer
"abcdefffffgh" One dominant run
"aaabaaa" Contributions from separate runs
"xyz" Small boundary case

Edge Cases

No Character Appears Three Times

Consider:

"abcdef"

Every character occurs exactly once. Even the length-1 special substrings fail to reach three occurrences. A careless implementation might incorrectly count occurrences across different characters, but our solution tracks each character independently, correctly returning -1.

One Very Long Run

Consider:

"aaaaaaaa"

All occurrences come from a single run. The answer is not the run length itself because the substring must occur at least three times. The formula r - L + 1 correctly computes overlapping occurrences and identifies the largest feasible length.

Contributions Across Multiple Runs

Consider:

"aaabaaa"

The substring "aa" appears both in the first run and the second run. A solution that only looks at the longest run would miss valid occurrences contributed by other runs. Our algorithm sums contributions from every run belonging to the same character, ensuring all occurrences are counted correctly.

Length One as the Answer

Consider:

"abcaba"

No run has length greater than one, yet character 'a' appears three times. The algorithm naturally handles this because runs of length one contribute one occurrence when checking L = 1, producing the correct answer of 1.