LeetCode 1163 - Last Substring in Lexicographical Order
The problem asks us to find the substring of a given string s that is lexicographically largest among all possible substrings. A substring is any contiguous portion of the string.
Difficulty: 🔴 Hard
Topics: Two Pointers, String
Solution
Problem Understanding
The problem asks us to find the substring of a given string s that is lexicographically largest among all possible substrings.
A substring is any contiguous portion of the string. For a string of length n, there are O(n²) possible substrings because every starting position can pair with many ending positions.
Lexicographical order is the same ordering used in dictionaries. When comparing two strings:
- Compare characters from left to right
- The first differing character determines the larger string
- If one string is a prefix of the other, the longer string is considered larger
For example:
"bab" > "aba"because'b' > 'a'"tcode" > "leetcode"because't' > 'l'"banana" > "ban"because the shorter string is a prefix
The input is a lowercase English string with length up to 4 * 10^5. This constraint is extremely important. A quadratic or cubic algorithm will time out. Even an O(n log n) solution is acceptable, but the problem is specifically designed for a linear-time solution.
The key observation is that every substring can be represented by its starting index. If we know the best starting index, then the answer is simply the suffix beginning there.
Several edge cases are important:
- A string where all characters are identical, such as
"aaaaa" - A string where the best substring starts near the end
- Repeated patterns like
"abababab" - Very large strings where inefficient substring creation becomes expensive
- Multiple candidate suffixes sharing long common prefixes
The problem guarantees that:
- The string is non-empty
- Only lowercase English letters are used
- The answer always exists because every non-empty string has at least one substring
Approaches
Brute Force Approach
The brute-force solution generates every possible substring and selects the lexicographically largest one.
For each starting index i, we generate substrings:
s[i:i+1]s[i:i+2]- ...
s[i:n]
Then we compare every substring against the current maximum.
This works because it explicitly checks all possibilities, guaranteeing correctness. However, it is far too slow for the constraints.
There are O(n²) substrings, and comparing strings can take O(n) time in the worst case. This leads to an overall complexity of O(n³) in the worst scenario.
Even optimizing slightly by considering only suffixes still leads to O(n²) comparisons because suffixes may share long prefixes.
Key Insight for the Optimal Solution
The lexicographically maximum substring must be a suffix of the string.
Why?
If two substrings start at the same position, the longer one is always lexicographically larger because the shorter one is its prefix. Therefore, for every starting index, the best substring is simply the suffix beginning there.
So the problem becomes:
Find the lexicographically largest suffix.
Instead of comparing every suffix independently, we can use a two-pointer strategy similar to the algorithm used in Booth's algorithm or maximal suffix computation.
The main insight is:
- Maintain two candidate starting positions
- Compare characters incrementally
- Eliminate losing candidates in batches
- Never revisit already eliminated regions
This allows the algorithm to run in linear time.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n³) | O(n²) | Generates and compares all substrings |
| Optimal | O(n) | O(1) | Uses two pointers to eliminate inferior suffixes efficiently |
Algorithm Walkthrough
We use three integers:
best, the current best suffix starting indexcandidate, another suffix competing againstbestoffset, the current character offset being compared
The algorithm works by comparing suffixes character by character.
Step 1: Initialize Two Competing Suffixes
Start with:
best = 0candidate = 1offset = 0
This means:
- The suffix starting at index
0is currently considered best - The suffix starting at index
1is competing against it
Step 2: Compare Characters Incrementally
Compare:
s[best + offset]s[candidate + offset]
There are three possibilities.
Step 3: Characters Match
If the characters are equal:
s[best + offset] == s[candidate + offset]
then neither suffix is better yet.
Increase offset by 1 and continue comparing deeper characters.
This efficiently skips matching prefixes.
Step 4: Candidate Suffix Is Better
If:
s[candidate + offset] > s[best + offset]
then the candidate suffix is lexicographically larger.
This means:
- All suffixes starting between
bestandbest + offsetare also worse - We can safely discard them
Update:
best = candidatecandidate = best + 1offset = 0
Step 5: Current Best Remains Better
If:
s[candidate + offset] < s[best + offset]
then the candidate loses.
All suffixes starting between candidate and candidate + offset also lose.
Move the candidate forward:
candidate = candidate + offset + 1
Reset:
offset = 0
Step 6: Avoid Comparing a Suffix Against Itself
If best == candidate, increment candidate.
Step 7: Continue Until Candidate Reaches the End
Once candidate >= n, no more competing suffixes exist.
Return:
s[best:]
Why It Works
The algorithm maintains the invariant that every eliminated suffix is guaranteed to be lexicographically smaller than the current best suffix.
When two suffixes differ at some offset:
- The suffix with the larger character at that offset wins
- Any suffix sharing the losing prefix can also be eliminated
Because each index is skipped at most once, the algorithm processes the string in linear time.
Python Solution
class Solution:
def lastSubstring(self, s: str) -> str:
n = len(s)
best = 0
candidate = 1
offset = 0
while candidate + offset < n:
if s[best + offset] == s[candidate + offset]:
offset += 1
elif s[best + offset] < s[candidate + offset]:
best = best + offset + 1
if best <= candidate:
best = candidate
candidate = best + 1
offset = 0
else:
candidate = candidate + offset + 1
offset = 0
if best == candidate:
candidate += 1
return s[best:]
The implementation follows the exact two-pointer strategy described earlier.
The variable best stores the current best suffix start index. The variable candidate stores another suffix being compared against it. The variable offset tracks how many characters have matched so far.
The loop continues while the candidate suffix still has characters remaining.
When characters match, we extend the comparison deeper into the suffixes by increasing offset.
When the candidate suffix becomes larger, we move the best pointer forward and discard all previously losing positions. This is the critical optimization that prevents quadratic behavior.
When the current best suffix remains larger, the candidate and all equivalent losing prefixes are skipped in one jump.
Finally, we return the suffix beginning at best.
Go Solution
func lastSubstring(s string) string {
n := len(s)
best := 0
candidate := 1
offset := 0
for candidate+offset < n {
if s[best+offset] == s[candidate+offset] {
offset++
} else if s[best+offset] < s[candidate+offset] {
best = best + offset + 1
if best <= candidate {
best = candidate
}
candidate = best + 1
offset = 0
} else {
candidate = candidate + offset + 1
offset = 0
}
if best == candidate {
candidate++
}
}
return s[best:]
}
The Go implementation is nearly identical to the Python version because the algorithm only relies on integer indices and direct string indexing.
Some Go-specific details are worth noting:
- Go strings are byte slices, which is safe here because the input contains only lowercase English letters
- No additional memory allocation is required except for the returned substring
- Integer overflow is not a concern because indices remain within the string length bounds
Worked Examples
Example 1
Input:
s = "abab"
All suffixes:
| Index | Suffix |
|---|---|
| 0 | abab |
| 1 | bab |
| 2 | ab |
| 3 | b |
We trace the algorithm.
| Step | best | candidate | offset | Compared | Action |
|---|---|---|---|---|---|
| Initial | 0 | 1 | 0 | a vs b | candidate better |
| After update | 1 | 2 | 0 | b vs a | best remains |
| Next | 1 | 3 | 0 | b vs b | match |
| Next | 1 | 3 | 1 | a vs end | stop |
Result:
"bab"
Example 2
Input:
s = "leetcode"
Suffixes:
| Index | Suffix |
|---|---|
| 0 | leetcode |
| 1 | eetcode |
| 2 | etcode |
| 3 | tcode |
| 4 | code |
| 5 | ode |
| 6 | de |
| 7 | e |
The suffix beginning with 't' is immediately largest because 't' is the maximum starting character.
Trace:
| Step | best | candidate | Compared | Action |
|---|---|---|---|---|
| 0 | 0 | 1 | l vs e | best remains |
| 1 | 0 | 2 | l vs e | best remains |
| 2 | 0 | 3 | l vs t | candidate better |
| 3 | 3 | 4 | t vs c | best remains |
| 4 | 3 | 5 | t vs o | best remains |
| 5 | 3 | 6 | t vs d | best remains |
| 6 | 3 | 7 | t vs e | best remains |
Result:
"tcode"
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each index is advanced at most a constant number of times |
| Space | O(1) | Only a few integer variables are used |
The linear time complexity comes from the fact that indices never move backward. Whenever a mismatch occurs, an entire block of suffixes is discarded permanently. This guarantees that the total number of character comparisons remains proportional to the string length.
The algorithm uses constant auxiliary space because it stores only pointers and offsets.
Test Cases
solution = Solution()
assert solution.lastSubstring("abab") == "bab" # alternating pattern
assert solution.lastSubstring("leetcode") == "tcode" # maximum character in middle
assert solution.lastSubstring("aaaaa") == "aaaaa" # all characters equal
assert solution.lastSubstring("z") == "z" # single character
assert solution.lastSubstring("banana") == "nana" # repeated maximal prefix
assert solution.lastSubstring("abcde") == "e" # strictly increasing
assert solution.lastSubstring("edcba") == "edcba" # strictly decreasing
assert solution.lastSubstring("zzza") == "zzza" # longest prefix wins
assert solution.lastSubstring("azzz") == "zzz" # best suffix at end
assert solution.lastSubstring("abababab") == "bababab" # repeated pattern
assert solution.lastSubstring("mississippi") == "ssissippi" # repeated complex overlaps
assert solution.lastSubstring("ababba") == "bba" # deeper lexicographic comparison
Test Summary
| Test | Why |
|---|---|
"abab" |
Validates repeated alternating structure |
"leetcode" |
Best suffix starts in middle |
"aaaaa" |
All suffixes share long prefixes |
"z" |
Smallest valid input |
"banana" |
Multiple repeated maximal characters |
"abcde" |
Best suffix is final character |
"edcba" |
Entire string is answer |
"zzza" |
Longer equal-prefix suffix wins |
"azzz" |
Best suffix near end |
"abababab" |
Repeated periodic structure |
"mississippi" |
Complex repeated overlaps |
"ababba" |
Requires deeper comparison beyond first character |
Edge Cases
All Characters Are Identical
Consider:
"aaaaaa"
Every suffix begins with the same character, so the algorithm must compare deeply before determining which suffix is larger.
This is a common source of bugs because naive implementations may repeatedly compare the same prefixes, leading to quadratic behavior.
The implemented solution handles this efficiently using the offset pointer. Matching characters are skipped incrementally without restarting comparisons from scratch.
Best Substring Appears Near the End
Consider:
"azzz"
The lexicographically largest substring is "zzz".
A naive implementation that assumes longer strings are always better could incorrectly return "azzz".
The algorithm correctly prioritizes earlier character differences over length. Since 'z' > 'a', the suffix beginning at index 1 immediately becomes the best candidate.
Long Repeated Patterns
Consider:
"abababababab"
Many suffixes share long prefixes, making naive suffix comparison extremely expensive.
This case stresses whether the implementation properly skips losing regions instead of rechecking them repeatedly.
The two-pointer elimination strategy guarantees that once a block of suffixes loses, none of them are reconsidered. This preserves the linear time complexity even for highly repetitive inputs.