LeetCode 467 - Unique Substrings in Wraparound String

The problem defines an infinite cyclic string built from the lowercase English alphabet: This means the sequence continues forever, and after 'z' comes 'a' again. Any substring that follows this cyclic ordering exists somewhere in the infinite wraparound string.

LeetCode Problem 467

Difficulty: 🟡 Medium
Topics: String, Dynamic Programming

Solution

Problem Understanding

The problem defines an infinite cyclic string built from the lowercase English alphabet:

"...zabcdefghijklmnopqrstuvwxyzabcdefghijklmnopqrstuvwxyzabcd..."

This means the sequence continues forever, and after 'z' comes 'a' again. Any substring that follows this cyclic ordering exists somewhere in the infinite wraparound string.

We are given a string s, and we must determine how many distinct non-empty substrings of s also appear in this wraparound string.

The important detail is that we are counting unique substrings, not total substrings. If the same valid substring appears multiple times in s, it should only be counted once.

For example, with:

s = "zab"

the valid substrings are:

"z", "a", "b", "za", "ab", "zab"

All of them follow the wraparound ordering because:

  • 'z' -> 'a' is valid
  • 'a' -> 'b' is valid

So the answer is 6.

The constraints are significant:

1 <= s.length <= 100000

A string of length 10^5 makes any algorithm that explicitly generates all substrings impractical. Since a string of length n has O(n^2) substrings, we need a much more efficient solution.

Several edge cases are important:

  • Strings with repeated characters such as "aaaaa"
  • Strings that repeatedly break the wraparound sequence such as "cac"
  • Wraparound transitions like 'z' -> 'a'
  • Long continuous valid chains such as "abcdefghijklmnopqrstuvwxyz"
  • Duplicate valid substrings appearing multiple times

A naive implementation can easily overcount duplicates or become too slow due to substring generation.

Approaches

Brute Force Approach

The most direct solution is to generate every possible substring of s, then check whether each substring exists in the infinite wraparound string.

To verify a substring, we can check whether every adjacent pair follows the cyclic alphabet rule:

next_char == previous_char + 1

or the special wraparound case:

'z' -> 'a'

We would store valid substrings in a set to avoid duplicate counting.

This approach is correct because it explicitly examines every substring and validates it according to the problem definition. However, it is far too slow for the input limits.

A string of length n contains:

n * (n + 1) / 2

substrings, which is O(n^2). Even if validation were optimized, generating and storing all substrings would still exceed feasible limits for n = 100000.

Optimal Dynamic Programming Approach

The key observation is that we do not actually need to store every substring individually.

Suppose we know that a valid wraparound sequence ending at character 'c' has maximum length k.

For example:

"...abcd"

ending at 'd' with length 4.

Then all shorter suffixes are automatically valid:

"d"
"cd"
"bcd"
"abcd"

This means that if the longest valid substring ending in 'd' has length 4, then there are exactly 4 unique valid substrings ending in 'd'.

Instead of storing substrings themselves, we only need to track:

max_length_ending[character]

where each entry stores the maximum length of a valid wraparound substring ending with that character.

If we later encounter another substring ending with the same character but with a shorter length, it contributes nothing new because all of its suffixes were already counted by the longer substring.

This transforms the problem from substring enumeration into a linear dynamic programming scan.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) to O(n³) O(n²) Generates all substrings and validates each one
Optimal O(n) O(1) Tracks longest valid chain ending at each character

Algorithm Walkthrough

  1. Create an array of size 26 called max_lengths.

Each index corresponds to a lowercase letter:

0 -> 'a'
1 -> 'b'
...
25 -> 'z'

max_lengths[i] stores the longest valid wraparound substring ending with that character. 2. Maintain a variable current_length.

This represents the length of the current valid wraparound chain ending at the current character. 3. Iterate through the string from left to right.

For each character s[i], determine whether it continues the wraparound sequence from s[i - 1]. 4. A continuation is valid if either:

ord(s[i]) - ord(s[i - 1]) == 1

or:

s[i - 1] == 'z' and s[i] == 'a'
  1. If the sequence continues, increment current_length.

Otherwise, reset it to 1 because a single character is always a valid substring. 6. Update the maximum length for the current ending character.

Suppose the current character is 'd' and the current chain length is 4.

Then:

max_lengths['d'] = max(existing_value, 4)
  1. Continue until the entire string has been processed.
  2. The final answer is the sum of all values in max_lengths.

Each value represents the number of unique substrings ending with that character.

Why it works

The core invariant is:

If the longest valid substring ending with character c has length k,
then there are exactly k unique valid substrings ending with c.

All shorter suffixes are automatically included within the longest substring.

By storing only the maximum length for each ending character, we avoid duplicate counting while still accounting for every unique valid substring exactly once.

Python Solution

class Solution:
    def findSubstringInWraproundString(self, s: str) -> int:
        max_lengths = [0] * 26
        current_length = 0

        for i in range(len(s)):
            if (
                i > 0
                and (
                    ord(s[i]) - ord(s[i - 1]) == 1
                    or (s[i - 1] == 'z' and s[i] == 'a')
                )
            ):
                current_length += 1
            else:
                current_length = 1

            char_index = ord(s[i]) - ord('a')
            max_lengths[char_index] = max(
                max_lengths[char_index],
                current_length
            )

        return sum(max_lengths)

The implementation closely follows the algorithm described earlier.

The max_lengths array stores the maximum valid chain length ending with each character. Since there are only 26 lowercase letters, this structure uses constant space.

The variable current_length tracks the length of the ongoing wraparound substring. During iteration, the code checks whether the current character extends the previous wraparound sequence. If it does, the chain length increases. Otherwise, the chain resets to 1.

The update step is the most important part:

max_lengths[char_index] = max(
    max_lengths[char_index],
    current_length
)

This guarantees we only keep the longest valid substring ending with that character. Any shorter valid substrings ending with the same character are already implied.

Finally, summing all values in max_lengths gives the total number of unique valid substrings.

Go Solution

func findSubstringInWraproundString(s string) int {
    maxLengths := make([]int, 26)
    currentLength := 0

    for i := 0; i < len(s); i++ {
        if i > 0 &&
            ((s[i]-s[i-1] == 1) ||
                (s[i-1] == 'z' && s[i] == 'a')) {
            currentLength++
        } else {
            currentLength = 1
        }

        index := s[i] - 'a'

        if currentLength > maxLengths[index] {
            maxLengths[index] = currentLength
        }
    }

    total := 0

    for _, length := range maxLengths {
        total += length
    }

    return total
}

The Go implementation follows the same logic as the Python solution.

One small difference is that Go strings are byte-indexable, which works perfectly here because the input contains only lowercase English letters. This allows direct character arithmetic using byte values.

The maxLengths slice has fixed size 26, ensuring constant auxiliary space usage.

No special handling for integer overflow is required because the maximum answer is bounded by the number of substrings in a string of length 100000, which easily fits within Go's int.

Worked Examples

Example 1

s = "a"
i char continues sequence? current_length max_lengths update
0 a No previous char 1 a = 1

Final max_lengths:

a: 1

Answer:

1

Unique substrings:

"a"

Example 2

s = "cac"
i char continues sequence? current_length max_lengths update
0 c No previous char 1 c = 1
1 a No 1 a = 1
2 c No 1 c remains 1

Final max_lengths:

a: 1
c: 1

Answer:

2

Unique substrings:

"a", "c"

Example 3

s = "zab"
i char continues sequence? current_length max_lengths update
0 z No previous char 1 z = 1
1 a Yes, wraparound 2 a = 2
2 b Yes 3 b = 3

Final max_lengths:

z: 1
a: 2
b: 3

Answer:

1 + 2 + 3 = 6

Unique substrings:

"z"
"a"
"b"
"za"
"ab"
"zab"

Complexity Analysis

Measure Complexity Explanation
Time O(n) The string is scanned exactly once
Space O(1) Only a fixed array of size 26 is used

The algorithm performs a single left-to-right traversal of the input string. Each iteration does constant work, so the total runtime is linear in the length of the string.

The auxiliary space remains constant because the max_lengths array always contains exactly 26 entries, regardless of input size.

Test Cases

solution = Solution()

assert solution.findSubstringInWraproundString("a") == 1
# Single character

assert solution.findSubstringInWraproundString("cac") == 2
# Repeated non-consecutive characters

assert solution.findSubstringInWraproundString("zab") == 6
# Wraparound transition z -> a

assert solution.findSubstringInWraproundString("abcdefghijklmnopqrstuvwxyz") == 351
# Entire alphabet in sequence

assert solution.findSubstringInWraproundString("aaaaa") == 1
# Repeated same character

assert solution.findSubstringInWraproundString("abac") == 4
# Sequence break in the middle

assert solution.findSubstringInWraproundString("zabcdefghijklmnopqrstuvwxyz") == 377
# Long wraparound chain

assert solution.findSubstringInWraproundString("cdefgh") == 21
# Continuous increasing sequence

assert solution.findSubstringInWraproundString("yzab") == 10
# Multiple wraparound transitions

assert solution.findSubstringInWraproundString("bcdxyzab") == 21
# Multiple disconnected valid chains

Test Summary

Test Why
"a" Smallest valid input
"cac" Ensures duplicates are not overcounted
"zab" Validates wraparound logic
"abcdefghijklmnopqrstuvwxyz" Long continuous sequence
"aaaaa" Repeated identical characters
"abac" Confirms reset behavior
"zabcdefghijklmnopqrstuvwxyz" Tests long wraparound continuity
"cdefgh" Validates arithmetic growth of substrings
"yzab" Multiple wraparound steps
"bcdxyzab" Multiple valid segments in one string

Edge Cases

Repeated Characters

A string like:

"aaaaa"

can easily cause overcounting bugs. Every character individually is valid, but substrings like "aa" are not valid wraparound substrings because 'a' -> 'a' does not follow the cyclic alphabet order.

The implementation handles this correctly by resetting current_length to 1 whenever the sequence rule fails.

Wraparound from 'z' to 'a'

The transition:

'z' -> 'a'

is the defining special case of the problem. A naive alphabetical comparison would incorrectly reject it.

The implementation explicitly checks:

s[i - 1] == 'z' and s[i] == 'a'

which ensures wraparound continuity is treated as valid.

Duplicate Valid Substrings

Consider:

"zabzab"

The substring "zab" appears multiple times. A naive counting approach may count it repeatedly.

The dynamic programming strategy avoids this entirely. Since we only store the maximum valid length ending with each character, repeated shorter chains do not increase the count. This guarantees uniqueness automatically.