LeetCode 1044 - Longest Duplicate Substring
Here is the complete, detailed technical solution guide for LeetCode 1044 - Longest Duplicate Substring, following your formatting rules exactly. The problem asks us to find the longest duplicated substring in a given string s.
Difficulty: 🔴 Hard
Topics: String, Binary Search, Sliding Window, Rolling Hash, Suffix Array, Hash Function
Solution
Here is the complete, detailed technical solution guide for LeetCode 1044 - Longest Duplicate Substring, following your formatting rules exactly.
Problem Understanding
The problem asks us to find the longest duplicated substring in a given string s. A duplicated substring is a contiguous sequence of characters that appears at least twice in s. The occurrences may overlap. For example, in the string "banana", "ana" occurs twice and is the longest such substring. If no duplicated substring exists, we return an empty string "".
The input is a string s of length between 2 and 30,000 consisting only of lowercase English letters. The output is a single string that is one of the longest duplicated substrings. The problem does not require all possible duplicates, only one longest duplicate.
The constraints indicate that the input can be large, so naive solutions that examine all substring pairs will not be efficient. Edge cases include strings with all unique characters, strings where the longest duplicate occurs at the start or end, and strings where duplicates overlap.
Approaches
The brute-force approach is to generate all possible substrings of s, store their occurrences in a hash map, and track the longest substring that occurs at least twice. This approach is correct but extremely inefficient: for a string of length n, there are O(n²) substrings, and checking each substring for duplicates requires O(n) comparisons, leading to O(n³) time complexity. This is impractical for n up to 30,000.
A more efficient approach combines binary search with rolling hash (Rabin-Karp). The key insight is that the length of the longest duplicate substring can be found using binary search. For a given candidate length L, we can efficiently check whether a duplicate of length L exists using a rolling hash to compute substring hashes in O(n) time. By adjusting L using binary search, we can find the maximum length with duplicates efficiently.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(n²) | Generate all substrings and check for duplicates |
| Binary Search + Rolling Hash | O(n log n) average | O(n) | Use binary search on length and rolling hash to check duplicates |
Algorithm Walkthrough
- Initialize search bounds: Set
low = 1andhigh = len(s). These represent the possible lengths of duplicate substrings. - Binary search: While
low < high, check the middle lengthmid = (low + high) // 2. Use a helper function to determine whether a duplicate substring of lengthmidexists. - Rolling hash: Convert each substring of length
midinto a numeric hash using a base (e.g., 26) and modulus (large prime to avoid collisions). Use a sliding window to compute the next hash from the previous hash in O(1) time. - Check duplicates: Store each hash in a set. If a new hash already exists in the set, a duplicate exists. If a duplicate is found, update the result substring and move the binary search to the right half (
low = mid + 1) to try longer lengths. Otherwise, move to the left half (high = mid). - Return the result: After binary search finishes, return the longest substring found.
Why it works: Binary search ensures we efficiently search for the maximum length. Rolling hash guarantees that substring comparisons are O(1) on average, and the set of hashes efficiently detects duplicates. Using modulus reduces hash collisions, and in practice the algorithm reliably finds the longest duplicate substring.
Python Solution
class Solution:
def longestDupSubstring(self, s: str) -> str:
def search(length: int) -> str:
base = 26
mod = 2**63 - 1
n = len(s)
h = 0
for i in range(length):
h = (h * base + ord(s[i]) - ord('a')) % mod
seen = {h}
baseL = pow(base, length, mod)
for start in range(1, n - length + 1):
h = (h * base - (ord(s[start - 1]) - ord('a')) * baseL + (ord(s[start + length - 1]) - ord('a'))) % mod
if h in seen:
return s[start:start + length]
seen.add(h)
return ""
low, high = 1, len(s)
result = ""
while low <= high:
mid = (low + high) // 2
dup = search(mid)
if dup:
result = dup
low = mid + 1
else:
high = mid - 1
return result
The Python implementation uses a nested function search(length) to find any duplicated substring of the given length using a rolling hash. Binary search determines the maximum length for which a duplicate exists. ord(s[i]) - ord('a') converts characters to integers for hashing. baseL is used to remove the leftmost character from the rolling hash efficiently.
Go Solution
func longestDupSubstring(s string) string {
n := len(s)
base := int64(26)
mod := int64(1<<63 - 1)
search := func(length int) string {
var h int64 = 0
for i := 0; i < length; i++ {
h = (h*base + int64(s[i]-'a')) % mod
}
seen := map[int64]bool{h: true}
baseL := int64(1)
for i := 0; i < length; i++ {
baseL = (baseL * base) % mod
}
for start := 1; start <= n-length; start++ {
h = (h*base - int64(s[start-1]-'a')*baseL + int64(s[start+length-1]-'a')) % mod
if h < 0 {
h += mod
}
if seen[h] {
return s[start : start+length]
}
seen[h] = true
}
return ""
}
low, high := 1, n
result := ""
for low <= high {
mid := (low + high) / 2
dup := search(mid)
if dup != "" {
result = dup
low = mid + 1
} else {
high = mid - 1
}
}
return result
}
The Go implementation mirrors the Python approach. Key differences include using int64 to prevent integer overflow and explicitly adjusting negative hash values due to Go’s modulo behavior. Maps are used for hash sets.
Worked Examples
Example 1: s = "banana"
Binary search lengths: low = 1, high = 6.
| Iteration | mid | Duplicate Found | Result |
|---|---|---|---|
| 1 | 3 | "ana" | "ana" |
| 2 | 5 | "" | "ana" |
| 3 | 4 | "" | "ana" |
Final output: "ana"
Example 2: s = "abcd"
Binary search lengths: low = 1, high = 4.
| Iteration | mid | Duplicate Found | Result |
|---|---|---|---|
| 1 | 2 | "" | "" |
| 2 | 1 | "" | "" |
Final output: ""
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) average | Binary search over lengths (log n) and rolling hash over n characters for each length |
| Space | O(n) | Hash set stores substring hashes for the current length |
The average time is O(n log n) assuming hash collisions are rare. Space is O(n) for storing hashes.
Test Cases
# Provided examples
assert Solution().longestDupSubstring("banana") == "ana" # longest duplicate in "banana"
assert Solution().longestDupSubstring("abcd") == "" # no duplicates
# Edge and stress cases
assert Solution().longestDupSubstring("aaaaa") == "aaaa" # repeated single character
assert Solution().longestDupSubstring("abcabcabc") == "abcabc" # repeated pattern
assert Solution().longestDupSubstring("abababab") == "ababab" # overlapping duplicates
assert Solution().longestDupSubstring("a"*30000) == "a"*29999 # max input size
assert Solution().longestDupSubstring("abacaba") in ["aba","aca"] # multiple options
| Test | Why |
|---|---|
| "banana" | Standard example with overlapping duplicate |
| "abcd" | No duplicates |
| "aaaaa" | Single character repeated, tests large duplicate |
| "abcabcabc" | Repeated pattern, non-overlapping |
| "abababab" | Overlapping duplicates |
| "a"*30000 | Max input size stress test |
| "abacaba" | Multiple valid longest duplicates |
Edge Cases
- All unique characters: Input like
"abcdef"has no duplicates. The rolling hash correctly returns""because no hash repeats. - Single character repeated: Input