LeetCode 1392 - Longest Happy Prefix
The problem asks us to find the longest substring that is both a prefix and a suffix of the given string s, while also e
Difficulty: 🔴 Hard
Topics: String, Rolling Hash, String Matching, Hash Function
Solution
Problem Understanding
The problem asks us to find the longest substring that is both a prefix and a suffix of the given string s, while also ensuring that the substring is not equal to the entire string itself.
A prefix is a substring that starts at index 0, while a suffix is a substring that ends at the last index of the string. The important restriction is that we cannot use the entire string as both the prefix and suffix. The prefix must be proper, meaning its length must be strictly smaller than the length of the string.
For example, in the string "ababab":
- Prefixes are:
"a","ab","aba","abab","ababa" - Suffixes are:
"b","ab","bab","abab","babab"
The longest matching value is "abab".
The input consists of a single string s containing only lowercase English letters. The length can be as large as 10^5, which immediately tells us that quadratic algorithms will likely be too slow. Any approach that repeatedly compares large substrings character by character could exceed time limits.
The expected output is the longest happy prefix. If no such non-empty prefix exists, we return the empty string "".
Several edge cases are important:
- Strings of length
1can never have a non-empty happy prefix because the only prefix is the entire string itself. - Strings with no repeated border structure, such as
"abcd", should return"". - Strings with repeated patterns, such as
"aaaaa"or"ababab", can have overlapping prefix and suffix matches. - Overlapping is explicitly allowed, which means the prefix and suffix do not need to occupy separate regions of the string.
The large constraint strongly suggests we need a linear time string matching algorithm.
Approaches
Brute Force Approach
The most direct solution is to check every possible prefix from longest to shortest and determine whether it is also a suffix.
Suppose the string length is n. We can iterate over all possible prefix lengths from n - 1 down to 1. For each length k, we compare:
s[:k]s[n-k:]
If they are equal, we return that prefix immediately because we are checking from longest to shortest.
This approach is correct because it exhaustively checks every valid candidate and returns the first valid match with maximum length.
However, the performance is poor. Each substring comparison can take O(k) time, and we may perform this for nearly every possible length. In the worst case, this becomes O(n^2).
With n = 10^5, quadratic complexity is too slow.
Optimal Approach, KMP Prefix Function
The key insight is that this problem is exactly what the KMP string matching preprocessing step was designed to solve.
The KMP algorithm computes a prefix table, also called the LPS array, where:
lps[i]stores the length of the longest proper prefix ofs[:i+1]that is also a suffix of that substring.
After computing the LPS array for the entire string:
lps[n - 1]gives the length of the longest happy prefix for the whole string.
This works because the final LPS value already represents the longest proper prefix that is also a suffix of the entire string.
The major advantage is that the LPS array is computed in linear time using previously computed information, avoiding repeated comparisons from scratch.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Checks every prefix and suffix pair directly |
| Optimal, KMP Prefix Function | O(n) | O(n) | Uses prefix matching information efficiently |
Algorithm Walkthrough
Optimal Algorithm Using KMP Prefix Function
- Create an array
lpsof lengthn, initialized with zeros.
The lps[i] value will represent the length of the longest proper prefix that is also a suffix for the substring s[:i+1].
2. Maintain a variable length.
This variable tracks the current longest prefix-suffix match while building the LPS table.
3. Start iterating from index 1.
The first character cannot have a proper prefix, so lps[0] is always 0.
4. Compare s[i] with s[length].
If they match, we can extend the current prefix-suffix match by one character.
- Increment
length - Set
lps[i] = length - Move to the next index
- Handle mismatches carefully.
If the characters do not match:
- If
length > 0, reducelengthtolps[length - 1] - This uses previously computed information instead of restarting from zero
We continue trying smaller valid prefix lengths until:
- A match is found, or
lengthbecomes0
- If
length == 0and characters still do not match:
- Set
lps[i] = 0 - Move to the next index
- After processing the entire string:
lps[n - 1]contains the length of the longest happy prefix
- Return the substring:
s[:lps[n - 1]]
Why it works
The LPS array maintains the invariant that every computed value represents the longest valid prefix-suffix match for the corresponding substring. When a mismatch occurs, instead of recomputing from scratch, the algorithm reuses previously computed border information stored in the LPS array. This guarantees that every character is processed only a constant number of times, producing a correct linear-time solution.
Python Solution
class Solution:
def longestPrefix(self, s: str) -> str:
n = len(s)
lps = [0] * n
length = 0
i = 1
while i < n:
if s[i] == s[length]:
length += 1
lps[i] = length
i += 1
else:
if length > 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
longest_length = lps[-1]
return s[:longest_length]
The implementation begins by initializing the lps array with zeros. This array stores prefix-suffix match lengths for every prefix of the string.
The variable length tracks the current border size. During iteration, we compare the current character s[i] with the next expected character in the prefix, s[length].
When characters match, we extend the border by one and record the new length in lps[i].
When characters do not match, we do not restart from scratch immediately. Instead, we shrink the candidate border using previously computed LPS values. This is the critical optimization that makes the algorithm linear.
At the end, lps[-1] stores the longest happy prefix length for the entire string, and we return the corresponding substring.
Go Solution
func longestPrefix(s string) string {
n := len(s)
lps := make([]int, n)
length := 0
i := 1
for i < n {
if s[i] == s[length] {
length++
lps[i] = length
i++
} else {
if length > 0 {
length = lps[length-1]
} else {
lps[i] = 0
i++
}
}
}
longestLength := lps[n-1]
return s[:longestLength]
}
The Go implementation follows the same logic as the Python version. Since Go strings are byte-indexable and the problem guarantees lowercase English letters, direct indexing is safe and efficient.
The make([]int, n) call initializes the LPS array. Empty strings are handled naturally because the constraints guarantee n >= 1.
Unlike Python slicing, Go substring operations also create slices over the original string data, making the final substring extraction efficient.
Worked Examples
Example 1
Input:
s = "level"
We build the LPS array step by step.
| i | s[i] | length before | Comparison | length after | lps[i] |
|---|---|---|---|---|---|
| 1 | e | 0 | e vs l, mismatch | 0 | 0 |
| 2 | v | 0 | v vs l, mismatch | 0 | 0 |
| 3 | e | 0 | e vs l, mismatch | 0 | 0 |
| 4 | l | 0 | l vs l, match | 1 | 1 |
Final LPS array:
[0, 0, 0, 0, 1]
The final value is 1, so the answer is:
s[:1] = "l"
Example 2
Input:
s = "ababab"
| i | s[i] | length before | Comparison | length after | lps[i] |
|---|---|---|---|---|---|
| 1 | b | 0 | b vs a, mismatch | 0 | 0 |
| 2 | a | 0 | a vs a, match | 1 | 1 |
| 3 | b | 1 | b vs b, match | 2 | 2 |
| 4 | a | 2 | a vs a, match | 3 | 3 |
| 5 | b | 3 | b vs b, match | 4 | 4 |
Final LPS array:
[0, 0, 1, 2, 3, 4]
The final value is 4, so the answer is:
s[:4] = "abab"
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed at most a constant number of times |
| Space | O(n) | The LPS array stores one integer per character |
The algorithm achieves linear performance because the length pointer only moves forward or backward based on previously computed LPS values. No character comparisons are repeated unnecessarily. Even though the inner mismatch logic may appear nested, the total number of operations across the entire algorithm remains linear.
Test Cases
solution = Solution()
assert solution.longestPrefix("level") == "l" # Basic example with single-character answer
assert solution.longestPrefix("ababab") == "abab" # Repeating pattern
assert solution.longestPrefix("leetcodeleet") == "leet" # Larger matching prefix
assert solution.longestPrefix("a") == "" # Single character
assert solution.longestPrefix("aa") == "a" # Small repeated string
assert solution.longestPrefix("aaaaa") == "aaaa" # Fully repeated characters
assert solution.longestPrefix("abcd") == "" # No happy prefix
assert solution.longestPrefix("abcabc") == "abc" # Exact repeated block
assert solution.longestPrefix("abababab") == "ababab" # Long overlapping structure
assert solution.longestPrefix("xyzxyzxyz") == "xyzxyz" # Repeated substring pattern
assert solution.longestPrefix("zzzzzz") == "zzzzz" # All same character
assert solution.longestPrefix("abacabab") == "ab" # Partial overlap
| Test | Why |
|---|---|
"level" |
Verifies standard example |
"ababab" |
Verifies overlapping prefix and suffix |
"leetcodeleet" |
Tests larger repeated substring |
"a" |
Smallest valid input |
"aa" |
Simplest non-empty happy prefix |
"aaaaa" |
Heavy repetition stress case |
"abcd" |
No valid prefix-suffix match |
"abcabc" |
Exact repeated block |
"abababab" |
Multiple overlapping borders |
"xyzxyzxyz" |
Repeated multi-character pattern |
"zzzzzz" |
Uniform characters |
"abacabab" |
Partial prefix fallback behavior |
Edge Cases
Single Character Strings
A string of length 1 cannot have a non-empty happy prefix because the only prefix is the entire string itself, which is not allowed. This case can easily produce off-by-one errors in implementations that forget the "proper prefix" requirement. The algorithm handles this naturally because the LPS value for the first character is always 0.
Strings With No Matching Prefix and Suffix
Inputs like "abcd" contain no repeated structure at all. A naive implementation may accidentally return the entire string or incorrectly match partial substrings. The KMP prefix table correctly leaves all entries as 0, resulting in an empty string answer.
Highly Repetitive Strings
Strings such as "aaaaaa" are important because many prefix lengths are valid simultaneously. Poor implementations may repeatedly compare the same characters and degrade toward quadratic performance. The LPS fallback mechanism prevents redundant work and keeps the runtime linear.
Overlapping Prefix and Suffix
The problem explicitly allows overlapping matches. For example, "ababab" has "abab" as both a prefix and suffix even though the regions overlap inside the string. Some incorrect solutions assume the prefix and suffix must be disjoint. The KMP approach naturally supports overlap because it only compares positions logically, not physically separate ranges.