LeetCode 1163 - Last Substring in Lexicographical Order

The problem asks us to find the substring of a given string s that is lexicographically largest among all possible substrings. A substring is any contiguous portion of the string.

LeetCode Problem 1163

Difficulty: 🔴 Hard
Topics: Two Pointers, String

Solution

Problem Understanding

The problem asks us to find the substring of a given string s that is lexicographically largest among all possible substrings.

A substring is any contiguous portion of the string. For a string of length n, there are O(n²) possible substrings because every starting position can pair with many ending positions.

Lexicographical order is the same ordering used in dictionaries. When comparing two strings:

  • Compare characters from left to right
  • The first differing character determines the larger string
  • If one string is a prefix of the other, the longer string is considered larger

For example:

  • "bab" > "aba" because 'b' > 'a'
  • "tcode" > "leetcode" because 't' > 'l'
  • "banana" > "ban" because the shorter string is a prefix

The input is a lowercase English string with length up to 4 * 10^5. This constraint is extremely important. A quadratic or cubic algorithm will time out. Even an O(n log n) solution is acceptable, but the problem is specifically designed for a linear-time solution.

The key observation is that every substring can be represented by its starting index. If we know the best starting index, then the answer is simply the suffix beginning there.

Several edge cases are important:

  • A string where all characters are identical, such as "aaaaa"
  • A string where the best substring starts near the end
  • Repeated patterns like "abababab"
  • Very large strings where inefficient substring creation becomes expensive
  • Multiple candidate suffixes sharing long common prefixes

The problem guarantees that:

  • The string is non-empty
  • Only lowercase English letters are used
  • The answer always exists because every non-empty string has at least one substring

Approaches

Brute Force Approach

The brute-force solution generates every possible substring and selects the lexicographically largest one.

For each starting index i, we generate substrings:

  • s[i:i+1]
  • s[i:i+2]
  • ...
  • s[i:n]

Then we compare every substring against the current maximum.

This works because it explicitly checks all possibilities, guaranteeing correctness. However, it is far too slow for the constraints.

There are O(n²) substrings, and comparing strings can take O(n) time in the worst case. This leads to an overall complexity of O(n³) in the worst scenario.

Even optimizing slightly by considering only suffixes still leads to O(n²) comparisons because suffixes may share long prefixes.

Key Insight for the Optimal Solution

The lexicographically maximum substring must be a suffix of the string.

Why?

If two substrings start at the same position, the longer one is always lexicographically larger because the shorter one is its prefix. Therefore, for every starting index, the best substring is simply the suffix beginning there.

So the problem becomes:

Find the lexicographically largest suffix.

Instead of comparing every suffix independently, we can use a two-pointer strategy similar to the algorithm used in Booth's algorithm or maximal suffix computation.

The main insight is:

  • Maintain two candidate starting positions
  • Compare characters incrementally
  • Eliminate losing candidates in batches
  • Never revisit already eliminated regions

This allows the algorithm to run in linear time.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n³) O(n²) Generates and compares all substrings
Optimal O(n) O(1) Uses two pointers to eliminate inferior suffixes efficiently

Algorithm Walkthrough

We use three integers:

  • best, the current best suffix starting index
  • candidate, another suffix competing against best
  • offset, the current character offset being compared

The algorithm works by comparing suffixes character by character.

Step 1: Initialize Two Competing Suffixes

Start with:

  1. best = 0
  2. candidate = 1
  3. offset = 0

This means:

  • The suffix starting at index 0 is currently considered best
  • The suffix starting at index 1 is competing against it

Step 2: Compare Characters Incrementally

Compare:

  • s[best + offset]
  • s[candidate + offset]

There are three possibilities.

Step 3: Characters Match

If the characters are equal:

s[best + offset] == s[candidate + offset]

then neither suffix is better yet.

Increase offset by 1 and continue comparing deeper characters.

This efficiently skips matching prefixes.

Step 4: Candidate Suffix Is Better

If:

s[candidate + offset] > s[best + offset]

then the candidate suffix is lexicographically larger.

This means:

  • All suffixes starting between best and best + offset are also worse
  • We can safely discard them

Update:

  1. best = candidate
  2. candidate = best + 1
  3. offset = 0

Step 5: Current Best Remains Better

If:

s[candidate + offset] < s[best + offset]

then the candidate loses.

All suffixes starting between candidate and candidate + offset also lose.

Move the candidate forward:

candidate = candidate + offset + 1

Reset:

offset = 0

Step 6: Avoid Comparing a Suffix Against Itself

If best == candidate, increment candidate.

Step 7: Continue Until Candidate Reaches the End

Once candidate >= n, no more competing suffixes exist.

Return:

s[best:]

Why It Works

The algorithm maintains the invariant that every eliminated suffix is guaranteed to be lexicographically smaller than the current best suffix.

When two suffixes differ at some offset:

  • The suffix with the larger character at that offset wins
  • Any suffix sharing the losing prefix can also be eliminated

Because each index is skipped at most once, the algorithm processes the string in linear time.

Python Solution

class Solution:
    def lastSubstring(self, s: str) -> str:
        n = len(s)

        best = 0
        candidate = 1
        offset = 0

        while candidate + offset < n:
            if s[best + offset] == s[candidate + offset]:
                offset += 1
            elif s[best + offset] < s[candidate + offset]:
                best = best + offset + 1
                if best <= candidate:
                    best = candidate

                candidate = best + 1
                offset = 0
            else:
                candidate = candidate + offset + 1
                offset = 0

            if best == candidate:
                candidate += 1

        return s[best:]

The implementation follows the exact two-pointer strategy described earlier.

The variable best stores the current best suffix start index. The variable candidate stores another suffix being compared against it. The variable offset tracks how many characters have matched so far.

The loop continues while the candidate suffix still has characters remaining.

When characters match, we extend the comparison deeper into the suffixes by increasing offset.

When the candidate suffix becomes larger, we move the best pointer forward and discard all previously losing positions. This is the critical optimization that prevents quadratic behavior.

When the current best suffix remains larger, the candidate and all equivalent losing prefixes are skipped in one jump.

Finally, we return the suffix beginning at best.

Go Solution

func lastSubstring(s string) string {
	n := len(s)

	best := 0
	candidate := 1
	offset := 0

	for candidate+offset < n {
		if s[best+offset] == s[candidate+offset] {
			offset++
		} else if s[best+offset] < s[candidate+offset] {
			best = best + offset + 1

			if best <= candidate {
				best = candidate
			}

			candidate = best + 1
			offset = 0
		} else {
			candidate = candidate + offset + 1
			offset = 0
		}

		if best == candidate {
			candidate++
		}
	}

	return s[best:]
}

The Go implementation is nearly identical to the Python version because the algorithm only relies on integer indices and direct string indexing.

Some Go-specific details are worth noting:

  • Go strings are byte slices, which is safe here because the input contains only lowercase English letters
  • No additional memory allocation is required except for the returned substring
  • Integer overflow is not a concern because indices remain within the string length bounds

Worked Examples

Example 1

Input:

s = "abab"

All suffixes:

Index Suffix
0 abab
1 bab
2 ab
3 b

We trace the algorithm.

Step best candidate offset Compared Action
Initial 0 1 0 a vs b candidate better
After update 1 2 0 b vs a best remains
Next 1 3 0 b vs b match
Next 1 3 1 a vs end stop

Result:

"bab"

Example 2

Input:

s = "leetcode"

Suffixes:

Index Suffix
0 leetcode
1 eetcode
2 etcode
3 tcode
4 code
5 ode
6 de
7 e

The suffix beginning with 't' is immediately largest because 't' is the maximum starting character.

Trace:

Step best candidate Compared Action
0 0 1 l vs e best remains
1 0 2 l vs e best remains
2 0 3 l vs t candidate better
3 3 4 t vs c best remains
4 3 5 t vs o best remains
5 3 6 t vs d best remains
6 3 7 t vs e best remains

Result:

"tcode"

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each index is advanced at most a constant number of times
Space O(1) Only a few integer variables are used

The linear time complexity comes from the fact that indices never move backward. Whenever a mismatch occurs, an entire block of suffixes is discarded permanently. This guarantees that the total number of character comparisons remains proportional to the string length.

The algorithm uses constant auxiliary space because it stores only pointers and offsets.

Test Cases

solution = Solution()

assert solution.lastSubstring("abab") == "bab"          # alternating pattern
assert solution.lastSubstring("leetcode") == "tcode"    # maximum character in middle
assert solution.lastSubstring("aaaaa") == "aaaaa"       # all characters equal
assert solution.lastSubstring("z") == "z"               # single character
assert solution.lastSubstring("banana") == "nana"       # repeated maximal prefix
assert solution.lastSubstring("abcde") == "e"           # strictly increasing
assert solution.lastSubstring("edcba") == "edcba"       # strictly decreasing
assert solution.lastSubstring("zzza") == "zzza"         # longest prefix wins
assert solution.lastSubstring("azzz") == "zzz"          # best suffix at end
assert solution.lastSubstring("abababab") == "bababab"  # repeated pattern
assert solution.lastSubstring("mississippi") == "ssissippi"  # repeated complex overlaps
assert solution.lastSubstring("ababba") == "bba"        # deeper lexicographic comparison

Test Summary

Test Why
"abab" Validates repeated alternating structure
"leetcode" Best suffix starts in middle
"aaaaa" All suffixes share long prefixes
"z" Smallest valid input
"banana" Multiple repeated maximal characters
"abcde" Best suffix is final character
"edcba" Entire string is answer
"zzza" Longer equal-prefix suffix wins
"azzz" Best suffix near end
"abababab" Repeated periodic structure
"mississippi" Complex repeated overlaps
"ababba" Requires deeper comparison beyond first character

Edge Cases

All Characters Are Identical

Consider:

"aaaaaa"

Every suffix begins with the same character, so the algorithm must compare deeply before determining which suffix is larger.

This is a common source of bugs because naive implementations may repeatedly compare the same prefixes, leading to quadratic behavior.

The implemented solution handles this efficiently using the offset pointer. Matching characters are skipped incrementally without restarting comparisons from scratch.

Best Substring Appears Near the End

Consider:

"azzz"

The lexicographically largest substring is "zzz".

A naive implementation that assumes longer strings are always better could incorrectly return "azzz".

The algorithm correctly prioritizes earlier character differences over length. Since 'z' > 'a', the suffix beginning at index 1 immediately becomes the best candidate.

Long Repeated Patterns

Consider:

"abababababab"

Many suffixes share long prefixes, making naive suffix comparison extremely expensive.

This case stresses whether the implementation properly skips losing regions instead of rechecking them repeatedly.

The two-pointer elimination strategy guarantees that once a block of suffixes loses, none of them are reconsidered. This preserves the linear time complexity even for highly repetitive inputs.