LeetCode 1062 - Longest Repeating Substring

The problem asks us to find the length of the longest substring that appears at least twice in a given string s. A substring is a contiguous sequence of characters inside a string. The repeated substrings may overlap with each other, which is an important detail.

LeetCode Problem 1062

Difficulty: 🟡 Medium
Topics: String, Binary Search, Dynamic Programming, Rolling Hash, Suffix Array, Hash Function

Solution

Problem Understanding

The problem asks us to find the length of the longest substring that appears at least twice in a given string s.

A substring is a contiguous sequence of characters inside a string. The repeated substrings may overlap with each other, which is an important detail. For example, in the string "aaaa", the substring "aaa" appears twice if overlapping is allowed: once starting at index 0 and once starting at index 1.

We are not asked to return the substring itself, only its maximum length.

For example, if s = "abbaba", the substrings "ab" and "ba" each appear more than once, and their length is 2. No longer substring repeats, so the answer is 2.

The input consists of a single lowercase English string s, where:

  • 1 <= s.length <= 2000
  • Characters are only lowercase English letters.

The relatively small limit of 2000 suggests that quadratic or mildly cubic solutions may still be acceptable, but extremely expensive brute force methods may struggle. Since the problem involves checking repeated substrings and finding the maximum valid length, this naturally suggests binary search combined with efficient substring matching.

Several edge cases are worth identifying early:

  • A string with no repeated characters, such as "abcd", should return 0.
  • A string with all identical characters, such as "aaaaa", produces many overlapping repeated substrings.
  • Very short strings, especially length 1, cannot contain repeated substrings.
  • Overlapping repetitions must be handled correctly. A naive implementation might incorrectly reject overlapping matches.

Approaches

Brute Force Approach

A straightforward way to solve this problem is to generate every possible substring and count how many times it appears.

We can iterate over all possible substring lengths from largest to smallest. For each length L, we generate every substring of that length and store it in a set. If we encounter the same substring again, then we know a repeating substring exists and can immediately return L.

For example, for "abbaba":

  • Length 5: no repeats
  • Length 4: no repeats
  • Length 3: no repeats
  • Length 2: "ab" appears twice, return 2

This approach works because we systematically check every possible substring length and guarantee correctness by searching from largest to smallest.

However, substring extraction itself costs O(L), and we examine many substrings for many lengths. In the worst case, this becomes cubic.

Optimal Approach, Binary Search + Rolling Hash

The key observation is that if a repeating substring of length L exists, then a repeating substring of every smaller length must also exist.

For example:

  • If a repeating substring of length 5 exists,
  • then repeated prefixes of length 4, 3, 2, and 1 must also exist.

This monotonic property makes the problem suitable for binary search on substring length.

Instead of checking every length sequentially, we binary search for the largest valid length.

To efficiently determine whether a repeated substring of length L exists, we use a rolling hash. Rather than comparing substrings character by character, we compute a hash value for each substring in constant time after preprocessing.

We slide a window of size L across the string:

  • Compute each substring hash.
  • Store hashes in a set.
  • If the same hash appears again, a repeated substring exists.

This reduces substring comparison from O(L) to O(1) average time.

Approach Time Complexity Space Complexity Notes
Brute Force O(n³) O(n²) Checks every substring explicitly
Optimal, Binary Search + Rolling Hash O(n log n) O(n) Uses binary search over lengths and hashing for fast duplicate detection

Algorithm Walkthrough

Optimal Algorithm, Binary Search + Rolling Hash

  1. Initialize binary search boundaries

Since substring lengths range from 1 to n - 1, we set:

  • left = 1
  • right = len(s) - 1

We also maintain answer = 0. 2. Binary search on substring length

Compute:

mid = (left + right) // 2

This represents the candidate substring length we want to test. 3. Check whether a repeated substring of length mid exists

We create a helper function that scans all substrings of length mid.

Instead of extracting raw substrings repeatedly, we compute rolling hashes. 4. Precompute powers for rolling hash

We use a polynomial rolling hash:

hash = (previous_hash * base + new_char) % mod

We also precompute:

base^L % mod

so we can efficiently remove characters leaving the sliding window. 5. Slide the window

For every substring of length L:

  • Compute its hash
  • Check if the hash already exists in a set
  • If yes, return True
  • Otherwise, insert it into the set
  1. Update binary search range

If a repeated substring exists for length mid:

  • Store mid as a valid answer
  • Search for a longer substring
left = mid + 1

Otherwise:

right = mid - 1
  1. Return final answer

After binary search finishes, answer stores the largest valid repeating substring length.

Why it works

The correctness relies on a monotonic property.

If a repeated substring of length L exists, then repeated substrings of every smaller length must also exist. This guarantees that the search space is ordered, making binary search valid.

The rolling hash guarantees efficient duplicate detection because equal substrings produce identical hash values. Since we only care whether duplicates exist, storing hashes in a set lets us check repetition efficiently.

Python Solution

class Solution:
    def longestRepeatingSubstring(self, s: str) -> int:
        n = len(s)

        def has_repeating_substring(length: int) -> bool:
            base = 26
            mod = 10**9 + 7

            current_hash = 0
            base_power = pow(base, length, mod)

            seen_hashes = set()

            for i in range(length):
                current_hash = (
                    current_hash * base
                    + (ord(s[i]) - ord('a'))
                ) % mod

            seen_hashes.add(current_hash)

            for i in range(length, n):
                current_hash = (
                    current_hash * base
                    - (ord(s[i - length]) - ord('a')) * base_power
                    + (ord(s[i]) - ord('a'))
                ) % mod

                if current_hash in seen_hashes:
                    return True

                seen_hashes.add(current_hash)

            return False

        left, right = 1, n - 1
        answer = 0

        while left <= right:
            middle = (left + right) // 2

            if has_repeating_substring(middle):
                answer = middle
                left = middle + 1
            else:
                right = middle - 1

        return answer

The implementation starts by defining a helper function called has_repeating_substring(). Its responsibility is to determine whether any repeating substring of a specific length exists.

Inside the helper function, a rolling hash is used. We first compute the hash for the initial window of size length. This serves as the starting substring.

A set stores all previously seen hash values. As the sliding window moves across the string, we update the hash in constant time rather than recomputing it from scratch.

When a hash already exists in the set, we immediately return True, meaning a repeated substring has been found.

The main function performs binary search across substring lengths. If a length works, we attempt larger lengths. If it fails, we search smaller ones. The variable answer always stores the best valid length discovered so far.

Go Solution

func longestRepeatingSubstring(s string) int {
	n := len(s)

	hasRepeatingSubstring := func(length int) bool {
		const base = 26
		const mod = 1000000007

		currentHash := 0
		basePower := 1

		for i := 0; i < length; i++ {
			basePower = (basePower * base) % mod
		}

		seenHashes := make(map[int]bool)

		for i := 0; i < length; i++ {
			currentHash = (
				currentHash*base +
					int(s[i]-'a'),
			) % mod
		}

		seenHashes[currentHash] = true

		for i := length; i < n; i++ {
			currentHash = (
				currentHash*base -
					int(s[i-length]-'a')*basePower +
					int(s[i]-'a'),
			) % mod

			if currentHash < 0 {
				currentHash += mod
			}

			if seenHashes[currentHash] {
				return true
			}

			seenHashes[currentHash] = true
		}

		return false
	}

	left, right := 1, n-1
	answer := 0

	for left <= right {
		middle := left + (right-left)/2

		if hasRepeatingSubstring(middle) {
			answer = middle
			left = middle + 1
		} else {
			right = middle - 1
		}
	}

	return answer
}

The Go implementation follows the same logic as Python but uses a map[int]bool instead of a Python set.

One important Go-specific detail is modulo behavior. In Python, modulo automatically stays positive. In Go, negative modulo values remain negative, so we explicitly add mod when necessary to normalize the hash value.

Another small difference is computing basePower manually using a loop instead of Python's built-in pow() with modulo support.

Worked Examples

Example 1

s = "abcd"

Binary search range:

left right middle
1 3 2

Check length 2:

Substrings:

Window Hash Seen?
"ab" No
"bc" No
"cd" No

No repetition found.

Update:

right = 1

Check length 1:

Window Seen?
"a" No
"b" No
"c" No
"d" No

No repeat.

Final answer:

0

Example 2

s = "abbaba"

Binary search:

left right middle
1 5 3

Check length 3:

Substrings:

abb
bba
bab
aba

No duplicate.

Update:

right = 2

Check length 1:

a, b, b

Duplicate found.

Update:

answer = 1
left = 2

Check length 2:

Substring Seen Before?
"ab" No
"bb" No
"ba" No
"ab" Yes

Found repetition.

Final answer:

2

Example 3

s = "aabcaabdaab"

Binary search first checks medium lengths.

For length 3:

Substring Seen Before?
"aab" No
"abc" No
"bca" No
"caa" No
"aab" Yes

A duplicate exists.

Longer lengths are checked, but none repeat.

Final answer:

3

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Binary search performs O(log n) checks, each scanning the string once
Space O(n) Hash set stores substring hashes

The binary search runs at most O(log n) iterations. During each iteration, the helper function scans the string once using a sliding window and rolling hash, which costs O(n).

Since we store seen hashes in a set, the additional memory usage is linear in the number of substrings examined.

Test Cases

solution = Solution()

assert solution.longestRepeatingSubstring("abcd") == 0  # no repeating substring
assert solution.longestRepeatingSubstring("abbaba") == 2  # repeated "ab" and "ba"
assert solution.longestRepeatingSubstring("aabcaabdaab") == 3  # repeated "aab"

assert solution.longestRepeatingSubstring("a") == 0  # single character
assert solution.longestRepeatingSubstring("aa") == 1  # simplest repetition
assert solution.longestRepeatingSubstring("aaa") == 2  # overlapping repetition
assert solution.longestRepeatingSubstring("aaaa") == 3  # multiple overlaps

assert solution.longestRepeatingSubstring("abcdefg") == 0  # all unique
assert solution.longestRepeatingSubstring("banana") == 3  # repeated "ana"
assert solution.longestRepeatingSubstring("abcabc") == 3  # repeated prefix/suffix
assert solution.longestRepeatingSubstring("ababab") == 4  # overlapping repeated substring
assert solution.longestRepeatingSubstring("zzzzzz") == 5  # repeated identical chars
Test Why
"abcd" Validates no repetition
"abbaba" Tests repeated length 2
"aabcaabdaab" Tests multiple repeated occurrences
"a" Minimum input size
"aa" Smallest repeating case
"aaa" Verifies overlap handling
"aaaa" Heavy overlap scenario
"abcdefg" All unique characters
"banana" Nontrivial repeated substring
"abcabc" Repeated prefix/suffix
"ababab" Overlapping longer repeat
"zzzzzz" Worst-case repetition

Edge Cases

Single Character String

A string of length 1 cannot possibly contain a repeated substring. This is easy to mishandle if the binary search bounds are incorrect. Our implementation safely initializes:

left = 1
right = n - 1

When n = 1, right = 0, so the binary search never runs and the function correctly returns 0.

Overlapping Repeated Substrings

Overlapping matches are allowed, which can trip up naive implementations.

For example:

"aaaa"

The substring "aaa" appears twice:

  • index 0
  • index 1

Because the sliding window checks every position independently, overlapping substrings are naturally included and correctly detected.

No Repeating Substring

Some strings contain no repetition at all.

Example:

"abcdef"

In this situation, every binary search check fails, so answer remains 0. Returning the initialized default value ensures correctness.

All Characters Identical

Strings like:

"aaaaaa"

produce many repeated substrings and many overlaps. A brute force solution becomes expensive because it repeatedly compares similar substrings.

The rolling hash approach handles this efficiently since substring comparison becomes constant time through hash matching, keeping the runtime near O(n log n).