LeetCode 1392 - Longest Happy Prefix

The problem asks us to find the longest substring that is both a prefix and a suffix of the given string s, while also e

LeetCode Problem 1392

Difficulty: 🔴 Hard
Topics: String, Rolling Hash, String Matching, Hash Function

Solution

Problem Understanding

The problem asks us to find the longest substring that is both a prefix and a suffix of the given string s, while also ensuring that the substring is not equal to the entire string itself.

A prefix is a substring that starts at index 0, while a suffix is a substring that ends at the last index of the string. The important restriction is that we cannot use the entire string as both the prefix and suffix. The prefix must be proper, meaning its length must be strictly smaller than the length of the string.

For example, in the string "ababab":

  • Prefixes are: "a", "ab", "aba", "abab", "ababa"
  • Suffixes are: "b", "ab", "bab", "abab", "babab"

The longest matching value is "abab".

The input consists of a single string s containing only lowercase English letters. The length can be as large as 10^5, which immediately tells us that quadratic algorithms will likely be too slow. Any approach that repeatedly compares large substrings character by character could exceed time limits.

The expected output is the longest happy prefix. If no such non-empty prefix exists, we return the empty string "".

Several edge cases are important:

  • Strings of length 1 can never have a non-empty happy prefix because the only prefix is the entire string itself.
  • Strings with no repeated border structure, such as "abcd", should return "".
  • Strings with repeated patterns, such as "aaaaa" or "ababab", can have overlapping prefix and suffix matches.
  • Overlapping is explicitly allowed, which means the prefix and suffix do not need to occupy separate regions of the string.

The large constraint strongly suggests we need a linear time string matching algorithm.

Approaches

Brute Force Approach

The most direct solution is to check every possible prefix from longest to shortest and determine whether it is also a suffix.

Suppose the string length is n. We can iterate over all possible prefix lengths from n - 1 down to 1. For each length k, we compare:

  • s[:k]
  • s[n-k:]

If they are equal, we return that prefix immediately because we are checking from longest to shortest.

This approach is correct because it exhaustively checks every valid candidate and returns the first valid match with maximum length.

However, the performance is poor. Each substring comparison can take O(k) time, and we may perform this for nearly every possible length. In the worst case, this becomes O(n^2).

With n = 10^5, quadratic complexity is too slow.

Optimal Approach, KMP Prefix Function

The key insight is that this problem is exactly what the KMP string matching preprocessing step was designed to solve.

The KMP algorithm computes a prefix table, also called the LPS array, where:

  • lps[i] stores the length of the longest proper prefix of s[:i+1] that is also a suffix of that substring.

After computing the LPS array for the entire string:

  • lps[n - 1] gives the length of the longest happy prefix for the whole string.

This works because the final LPS value already represents the longest proper prefix that is also a suffix of the entire string.

The major advantage is that the LPS array is computed in linear time using previously computed information, avoiding repeated comparisons from scratch.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Checks every prefix and suffix pair directly
Optimal, KMP Prefix Function O(n) O(n) Uses prefix matching information efficiently

Algorithm Walkthrough

Optimal Algorithm Using KMP Prefix Function

  1. Create an array lps of length n, initialized with zeros.

The lps[i] value will represent the length of the longest proper prefix that is also a suffix for the substring s[:i+1]. 2. Maintain a variable length.

This variable tracks the current longest prefix-suffix match while building the LPS table. 3. Start iterating from index 1.

The first character cannot have a proper prefix, so lps[0] is always 0. 4. Compare s[i] with s[length].

If they match, we can extend the current prefix-suffix match by one character.

  • Increment length
  • Set lps[i] = length
  • Move to the next index
  1. Handle mismatches carefully.

If the characters do not match:

  • If length > 0, reduce length to lps[length - 1]
  • This uses previously computed information instead of restarting from zero

We continue trying smaller valid prefix lengths until:

  • A match is found, or
  • length becomes 0
  1. If length == 0 and characters still do not match:
  • Set lps[i] = 0
  • Move to the next index
  1. After processing the entire string:
  • lps[n - 1] contains the length of the longest happy prefix
  1. Return the substring:
  • s[:lps[n - 1]]

Why it works

The LPS array maintains the invariant that every computed value represents the longest valid prefix-suffix match for the corresponding substring. When a mismatch occurs, instead of recomputing from scratch, the algorithm reuses previously computed border information stored in the LPS array. This guarantees that every character is processed only a constant number of times, producing a correct linear-time solution.

Python Solution

class Solution:
    def longestPrefix(self, s: str) -> str:
        n = len(s)

        lps = [0] * n
        length = 0
        i = 1

        while i < n:
            if s[i] == s[length]:
                length += 1
                lps[i] = length
                i += 1
            else:
                if length > 0:
                    length = lps[length - 1]
                else:
                    lps[i] = 0
                    i += 1

        longest_length = lps[-1]
        return s[:longest_length]

The implementation begins by initializing the lps array with zeros. This array stores prefix-suffix match lengths for every prefix of the string.

The variable length tracks the current border size. During iteration, we compare the current character s[i] with the next expected character in the prefix, s[length].

When characters match, we extend the border by one and record the new length in lps[i].

When characters do not match, we do not restart from scratch immediately. Instead, we shrink the candidate border using previously computed LPS values. This is the critical optimization that makes the algorithm linear.

At the end, lps[-1] stores the longest happy prefix length for the entire string, and we return the corresponding substring.

Go Solution

func longestPrefix(s string) string {
	n := len(s)

	lps := make([]int, n)
	length := 0
	i := 1

	for i < n {
		if s[i] == s[length] {
			length++
			lps[i] = length
			i++
		} else {
			if length > 0 {
				length = lps[length-1]
			} else {
				lps[i] = 0
				i++
			}
		}
	}

	longestLength := lps[n-1]
	return s[:longestLength]
}

The Go implementation follows the same logic as the Python version. Since Go strings are byte-indexable and the problem guarantees lowercase English letters, direct indexing is safe and efficient.

The make([]int, n) call initializes the LPS array. Empty strings are handled naturally because the constraints guarantee n >= 1.

Unlike Python slicing, Go substring operations also create slices over the original string data, making the final substring extraction efficient.

Worked Examples

Example 1

Input:

s = "level"

We build the LPS array step by step.

i s[i] length before Comparison length after lps[i]
1 e 0 e vs l, mismatch 0 0
2 v 0 v vs l, mismatch 0 0
3 e 0 e vs l, mismatch 0 0
4 l 0 l vs l, match 1 1

Final LPS array:

[0, 0, 0, 0, 1]

The final value is 1, so the answer is:

s[:1] = "l"

Example 2

Input:

s = "ababab"
i s[i] length before Comparison length after lps[i]
1 b 0 b vs a, mismatch 0 0
2 a 0 a vs a, match 1 1
3 b 1 b vs b, match 2 2
4 a 2 a vs a, match 3 3
5 b 3 b vs b, match 4 4

Final LPS array:

[0, 0, 1, 2, 3, 4]

The final value is 4, so the answer is:

s[:4] = "abab"

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed at most a constant number of times
Space O(n) The LPS array stores one integer per character

The algorithm achieves linear performance because the length pointer only moves forward or backward based on previously computed LPS values. No character comparisons are repeated unnecessarily. Even though the inner mismatch logic may appear nested, the total number of operations across the entire algorithm remains linear.

Test Cases

solution = Solution()

assert solution.longestPrefix("level") == "l"  # Basic example with single-character answer
assert solution.longestPrefix("ababab") == "abab"  # Repeating pattern
assert solution.longestPrefix("leetcodeleet") == "leet"  # Larger matching prefix
assert solution.longestPrefix("a") == ""  # Single character
assert solution.longestPrefix("aa") == "a"  # Small repeated string
assert solution.longestPrefix("aaaaa") == "aaaa"  # Fully repeated characters
assert solution.longestPrefix("abcd") == ""  # No happy prefix
assert solution.longestPrefix("abcabc") == "abc"  # Exact repeated block
assert solution.longestPrefix("abababab") == "ababab"  # Long overlapping structure
assert solution.longestPrefix("xyzxyzxyz") == "xyzxyz"  # Repeated substring pattern
assert solution.longestPrefix("zzzzzz") == "zzzzz"  # All same character
assert solution.longestPrefix("abacabab") == "ab"  # Partial overlap
Test Why
"level" Verifies standard example
"ababab" Verifies overlapping prefix and suffix
"leetcodeleet" Tests larger repeated substring
"a" Smallest valid input
"aa" Simplest non-empty happy prefix
"aaaaa" Heavy repetition stress case
"abcd" No valid prefix-suffix match
"abcabc" Exact repeated block
"abababab" Multiple overlapping borders
"xyzxyzxyz" Repeated multi-character pattern
"zzzzzz" Uniform characters
"abacabab" Partial prefix fallback behavior

Edge Cases

Single Character Strings

A string of length 1 cannot have a non-empty happy prefix because the only prefix is the entire string itself, which is not allowed. This case can easily produce off-by-one errors in implementations that forget the "proper prefix" requirement. The algorithm handles this naturally because the LPS value for the first character is always 0.

Strings With No Matching Prefix and Suffix

Inputs like "abcd" contain no repeated structure at all. A naive implementation may accidentally return the entire string or incorrectly match partial substrings. The KMP prefix table correctly leaves all entries as 0, resulting in an empty string answer.

Highly Repetitive Strings

Strings such as "aaaaaa" are important because many prefix lengths are valid simultaneously. Poor implementations may repeatedly compare the same characters and degrade toward quadratic performance. The LPS fallback mechanism prevents redundant work and keeps the runtime linear.

Overlapping Prefix and Suffix

The problem explicitly allows overlapping matches. For example, "ababab" has "abab" as both a prefix and suffix even though the regions overlap inside the string. Some incorrect solutions assume the prefix and suffix must be disjoint. The KMP approach naturally supports overlap because it only compares positions logically, not physically separate ranges.