LeetCode 1062 - Longest Repeating Substring
The problem asks us to find the length of the longest substring that appears at least twice in a given string s. A substring is a contiguous sequence of characters inside a string. The repeated substrings may overlap with each other, which is an important detail.
Difficulty: 🟡 Medium
Topics: String, Binary Search, Dynamic Programming, Rolling Hash, Suffix Array, Hash Function
Solution
Problem Understanding
The problem asks us to find the length of the longest substring that appears at least twice in a given string s.
A substring is a contiguous sequence of characters inside a string. The repeated substrings may overlap with each other, which is an important detail. For example, in the string "aaaa", the substring "aaa" appears twice if overlapping is allowed: once starting at index 0 and once starting at index 1.
We are not asked to return the substring itself, only its maximum length.
For example, if s = "abbaba", the substrings "ab" and "ba" each appear more than once, and their length is 2. No longer substring repeats, so the answer is 2.
The input consists of a single lowercase English string s, where:
1 <= s.length <= 2000- Characters are only lowercase English letters.
The relatively small limit of 2000 suggests that quadratic or mildly cubic solutions may still be acceptable, but extremely expensive brute force methods may struggle. Since the problem involves checking repeated substrings and finding the maximum valid length, this naturally suggests binary search combined with efficient substring matching.
Several edge cases are worth identifying early:
- A string with no repeated characters, such as
"abcd", should return0. - A string with all identical characters, such as
"aaaaa", produces many overlapping repeated substrings. - Very short strings, especially length
1, cannot contain repeated substrings. - Overlapping repetitions must be handled correctly. A naive implementation might incorrectly reject overlapping matches.
Approaches
Brute Force Approach
A straightforward way to solve this problem is to generate every possible substring and count how many times it appears.
We can iterate over all possible substring lengths from largest to smallest. For each length L, we generate every substring of that length and store it in a set. If we encounter the same substring again, then we know a repeating substring exists and can immediately return L.
For example, for "abbaba":
- Length
5: no repeats - Length
4: no repeats - Length
3: no repeats - Length
2:"ab"appears twice, return2
This approach works because we systematically check every possible substring length and guarantee correctness by searching from largest to smallest.
However, substring extraction itself costs O(L), and we examine many substrings for many lengths. In the worst case, this becomes cubic.
Optimal Approach, Binary Search + Rolling Hash
The key observation is that if a repeating substring of length L exists, then a repeating substring of every smaller length must also exist.
For example:
- If a repeating substring of length
5exists, - then repeated prefixes of length
4,3,2, and1must also exist.
This monotonic property makes the problem suitable for binary search on substring length.
Instead of checking every length sequentially, we binary search for the largest valid length.
To efficiently determine whether a repeated substring of length L exists, we use a rolling hash. Rather than comparing substrings character by character, we compute a hash value for each substring in constant time after preprocessing.
We slide a window of size L across the string:
- Compute each substring hash.
- Store hashes in a set.
- If the same hash appears again, a repeated substring exists.
This reduces substring comparison from O(L) to O(1) average time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(n²) | Checks every substring explicitly |
| Optimal, Binary Search + Rolling Hash | O(n log n) | O(n) | Uses binary search over lengths and hashing for fast duplicate detection |
Algorithm Walkthrough
Optimal Algorithm, Binary Search + Rolling Hash
- Initialize binary search boundaries
Since substring lengths range from 1 to n - 1, we set:
left = 1right = len(s) - 1
We also maintain answer = 0.
2. Binary search on substring length
Compute:
mid = (left + right) // 2
This represents the candidate substring length we want to test.
3. Check whether a repeated substring of length mid exists
We create a helper function that scans all substrings of length mid.
Instead of extracting raw substrings repeatedly, we compute rolling hashes. 4. Precompute powers for rolling hash
We use a polynomial rolling hash:
hash = (previous_hash * base + new_char) % mod
We also precompute:
base^L % mod
so we can efficiently remove characters leaving the sliding window. 5. Slide the window
For every substring of length L:
- Compute its hash
- Check if the hash already exists in a set
- If yes, return
True - Otherwise, insert it into the set
- Update binary search range
If a repeated substring exists for length mid:
- Store
midas a valid answer - Search for a longer substring
left = mid + 1
Otherwise:
right = mid - 1
- Return final answer
After binary search finishes, answer stores the largest valid repeating substring length.
Why it works
The correctness relies on a monotonic property.
If a repeated substring of length L exists, then repeated substrings of every smaller length must also exist. This guarantees that the search space is ordered, making binary search valid.
The rolling hash guarantees efficient duplicate detection because equal substrings produce identical hash values. Since we only care whether duplicates exist, storing hashes in a set lets us check repetition efficiently.
Python Solution
class Solution:
def longestRepeatingSubstring(self, s: str) -> int:
n = len(s)
def has_repeating_substring(length: int) -> bool:
base = 26
mod = 10**9 + 7
current_hash = 0
base_power = pow(base, length, mod)
seen_hashes = set()
for i in range(length):
current_hash = (
current_hash * base
+ (ord(s[i]) - ord('a'))
) % mod
seen_hashes.add(current_hash)
for i in range(length, n):
current_hash = (
current_hash * base
- (ord(s[i - length]) - ord('a')) * base_power
+ (ord(s[i]) - ord('a'))
) % mod
if current_hash in seen_hashes:
return True
seen_hashes.add(current_hash)
return False
left, right = 1, n - 1
answer = 0
while left <= right:
middle = (left + right) // 2
if has_repeating_substring(middle):
answer = middle
left = middle + 1
else:
right = middle - 1
return answer
The implementation starts by defining a helper function called has_repeating_substring(). Its responsibility is to determine whether any repeating substring of a specific length exists.
Inside the helper function, a rolling hash is used. We first compute the hash for the initial window of size length. This serves as the starting substring.
A set stores all previously seen hash values. As the sliding window moves across the string, we update the hash in constant time rather than recomputing it from scratch.
When a hash already exists in the set, we immediately return True, meaning a repeated substring has been found.
The main function performs binary search across substring lengths. If a length works, we attempt larger lengths. If it fails, we search smaller ones. The variable answer always stores the best valid length discovered so far.
Go Solution
func longestRepeatingSubstring(s string) int {
n := len(s)
hasRepeatingSubstring := func(length int) bool {
const base = 26
const mod = 1000000007
currentHash := 0
basePower := 1
for i := 0; i < length; i++ {
basePower = (basePower * base) % mod
}
seenHashes := make(map[int]bool)
for i := 0; i < length; i++ {
currentHash = (
currentHash*base +
int(s[i]-'a'),
) % mod
}
seenHashes[currentHash] = true
for i := length; i < n; i++ {
currentHash = (
currentHash*base -
int(s[i-length]-'a')*basePower +
int(s[i]-'a'),
) % mod
if currentHash < 0 {
currentHash += mod
}
if seenHashes[currentHash] {
return true
}
seenHashes[currentHash] = true
}
return false
}
left, right := 1, n-1
answer := 0
for left <= right {
middle := left + (right-left)/2
if hasRepeatingSubstring(middle) {
answer = middle
left = middle + 1
} else {
right = middle - 1
}
}
return answer
}
The Go implementation follows the same logic as Python but uses a map[int]bool instead of a Python set.
One important Go-specific detail is modulo behavior. In Python, modulo automatically stays positive. In Go, negative modulo values remain negative, so we explicitly add mod when necessary to normalize the hash value.
Another small difference is computing basePower manually using a loop instead of Python's built-in pow() with modulo support.
Worked Examples
Example 1
s = "abcd"
Binary search range:
| left | right | middle |
|---|---|---|
| 1 | 3 | 2 |
Check length 2:
Substrings:
| Window | Hash Seen? |
|---|---|
| "ab" | No |
| "bc" | No |
| "cd" | No |
No repetition found.
Update:
right = 1
Check length 1:
| Window | Seen? |
|---|---|
| "a" | No |
| "b" | No |
| "c" | No |
| "d" | No |
No repeat.
Final answer:
0
Example 2
s = "abbaba"
Binary search:
| left | right | middle |
|---|---|---|
| 1 | 5 | 3 |
Check length 3:
Substrings:
abb
bba
bab
aba
No duplicate.
Update:
right = 2
Check length 1:
a, b, b
Duplicate found.
Update:
answer = 1
left = 2
Check length 2:
| Substring | Seen Before? |
|---|---|
| "ab" | No |
| "bb" | No |
| "ba" | No |
| "ab" | Yes |
Found repetition.
Final answer:
2
Example 3
s = "aabcaabdaab"
Binary search first checks medium lengths.
For length 3:
| Substring | Seen Before? |
|---|---|
| "aab" | No |
| "abc" | No |
| "bca" | No |
| "caa" | No |
| "aab" | Yes |
A duplicate exists.
Longer lengths are checked, but none repeat.
Final answer:
3
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Binary search performs O(log n) checks, each scanning the string once |
| Space | O(n) | Hash set stores substring hashes |
The binary search runs at most O(log n) iterations. During each iteration, the helper function scans the string once using a sliding window and rolling hash, which costs O(n).
Since we store seen hashes in a set, the additional memory usage is linear in the number of substrings examined.
Test Cases
solution = Solution()
assert solution.longestRepeatingSubstring("abcd") == 0 # no repeating substring
assert solution.longestRepeatingSubstring("abbaba") == 2 # repeated "ab" and "ba"
assert solution.longestRepeatingSubstring("aabcaabdaab") == 3 # repeated "aab"
assert solution.longestRepeatingSubstring("a") == 0 # single character
assert solution.longestRepeatingSubstring("aa") == 1 # simplest repetition
assert solution.longestRepeatingSubstring("aaa") == 2 # overlapping repetition
assert solution.longestRepeatingSubstring("aaaa") == 3 # multiple overlaps
assert solution.longestRepeatingSubstring("abcdefg") == 0 # all unique
assert solution.longestRepeatingSubstring("banana") == 3 # repeated "ana"
assert solution.longestRepeatingSubstring("abcabc") == 3 # repeated prefix/suffix
assert solution.longestRepeatingSubstring("ababab") == 4 # overlapping repeated substring
assert solution.longestRepeatingSubstring("zzzzzz") == 5 # repeated identical chars
| Test | Why |
|---|---|
"abcd" |
Validates no repetition |
"abbaba" |
Tests repeated length 2 |
"aabcaabdaab" |
Tests multiple repeated occurrences |
"a" |
Minimum input size |
"aa" |
Smallest repeating case |
"aaa" |
Verifies overlap handling |
"aaaa" |
Heavy overlap scenario |
"abcdefg" |
All unique characters |
"banana" |
Nontrivial repeated substring |
"abcabc" |
Repeated prefix/suffix |
"ababab" |
Overlapping longer repeat |
"zzzzzz" |
Worst-case repetition |
Edge Cases
Single Character String
A string of length 1 cannot possibly contain a repeated substring. This is easy to mishandle if the binary search bounds are incorrect. Our implementation safely initializes:
left = 1
right = n - 1
When n = 1, right = 0, so the binary search never runs and the function correctly returns 0.
Overlapping Repeated Substrings
Overlapping matches are allowed, which can trip up naive implementations.
For example:
"aaaa"
The substring "aaa" appears twice:
- index
0 - index
1
Because the sliding window checks every position independently, overlapping substrings are naturally included and correctly detected.
No Repeating Substring
Some strings contain no repetition at all.
Example:
"abcdef"
In this situation, every binary search check fails, so answer remains 0. Returning the initialized default value ensures correctness.
All Characters Identical
Strings like:
"aaaaaa"
produce many repeated substrings and many overlaps. A brute force solution becomes expensive because it repeatedly compares similar substrings.
The rolling hash approach handles this efficiently since substring comparison becomes constant time through hash matching, keeping the runtime near O(n log n).