LeetCode 2223 - Sum of Scores of Built Strings

The problem gives us a final string s and describes a process where the string is built by repeatedly prepending characters to the front. This means that every intermediate string si is actually a suffix of the final string s.

LeetCode Problem 2223

Difficulty: šŸ”“ Hard
Topics: String, Binary Search, Rolling Hash, Suffix Array, String Matching, Hash Function

Solution

Problem Understanding

The problem gives us a final string s and describes a process where the string is built by repeatedly prepending characters to the front. This means that every intermediate string si is actually a suffix of the final string s.

For example, if:

s = "abaca"

Then:

s1 = "a"
s2 = "ca"
s3 = "aca"
s4 = "baca"
s5 = "abaca"

Notice that each si is simply the last i characters of s.

The score of each si is defined as the length of the longest common prefix between si and the final string sn = s.

In simpler terms, for every suffix of s, we want to know:

How many characters from the beginning match the prefix of s?

We then sum all of those scores.

Another way to think about the problem is:

For every suffix of s, compute the longest common prefix (LCP) with the full string s, then return the total.

For example:

s = "babab"

Suffixes:

"b"      → LCP = 1
"ab"     → LCP = 0
"bab"    → LCP = 3
"abab"   → LCP = 0
"babab"  → LCP = 5

Total:

1 + 0 + 3 + 0 + 5 = 9

The constraints are important:

1 <= s.length <= 10^5

A string of length 100,000 immediately tells us that quadratic algorithms will likely time out. Any approach that compares substrings character by character for every suffix will be too slow.

The problem also guarantees:

  • The string contains only lowercase English letters.
  • The input length is at least 1, meaning we never have an empty string.
  • Every intermediate string si is a suffix of s.

Important edge cases include strings with all identical characters, strings with no repeated prefixes, and very short inputs such as length 1. These cases can expose bugs in substring matching logic or off by one errors.

Approaches

Brute Force Approach

The most straightforward idea is to generate every suffix and compare it character by character with the original string.

For each suffix starting at index i:

  1. Compare s[0] with s[i]
  2. Compare s[1] with s[i+1]
  3. Continue until characters differ or one string ends
  4. Add the matched length to the answer

For example:

s = "babab"

At index 2:

s = "babab"
suffix = "bab"

Compare:

b == b
a == a
b == b

Score = 3.

This works because it explicitly computes the required longest common prefix for every suffix.

However, the worst case becomes extremely expensive.

Consider:

s = "aaaaaaaaaa..."

Every suffix matches almost the entire prefix.

Work becomes:

n + (n-1) + (n-2) + ...

Which is:

O(n²)

For n = 100,000, this is far too slow.

Optimal Approach, Z Algorithm

The key observation is that the problem is secretly asking:

For every suffix, find the longest prefix match with the whole string.

This is exactly what the Z Algorithm computes.

The Z-array for a string stores:

z[i] = longest substring starting at i
       that matches the prefix of s

For example:

s = "babab"

Z-array:

[0, 0, 3, 0, 1]

Meaning:

index 0 → ignored
index 1 → match length 0
index 2 → "bab" matches prefix
index 3 → match length 0
index 4 → "b" matches prefix

The full string itself contributes:

len(s)

So the answer becomes:

sum(z) + n

The Z Algorithm computes all prefix matches in linear time, making it ideal for 10^5 characters.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Compare every suffix character by character
Optimal, Z Algorithm O(n) O(n) Computes longest prefix match for all suffixes in one pass

Algorithm Walkthrough

We use the Z Algorithm to compute prefix matches efficiently.

Step 1: Initialize the Z-array

Create an array:

z = [0] * n

Where:

z[i]

stores the longest common prefix between:

s

and:

s[i:]

We also maintain a matching window:

[left, right]

This window represents the current substring known to match the prefix.

Step 2: Iterate through the string

For every index:

i from 1 to n-1

we compute z[i].

Index 0 is skipped because the full string trivially matches itself.

Step 3: Reuse previous work inside the Z-box

If:

i <= right

then we are inside an already matched region.

We can reuse previous results:

z[i] = min(right - i + 1, z[i - left])

This avoids redundant comparisons.

The idea is:

Since we already know a segment matches the prefix, we can copy information from an earlier position.

Step 4: Extend the match

After reusing information, we attempt to extend the match further:

while i + z[i] < n and s[z[i]] == s[i + z[i]]:
    z[i] += 1

This continues character comparisons only when necessary.

Step 5: Update the matching window

If the new match extends past right:

i + z[i] - 1 > right

update:

left = i
right = i + z[i] - 1

This creates a larger reusable matching region.

Step 6: Compute the answer

The Z-array stores scores for every suffix except the full string itself.

Since:

s == s

the final string contributes:

n

Return:

sum(z) + n

Why it works

The Z Algorithm guarantees that z[i] equals the longest common prefix between the string prefix and the suffix beginning at i. Since every si is a suffix of s, each required score is exactly one Z-value. The only missing case is the entire string itself, whose score equals n. Summing all values therefore produces the correct total score.

Python Solution

class Solution:
    def sumScores(self, s: str) -> int:
        n = len(s)
        z = [0] * n

        left = 0
        right = 0

        for i in range(1, n):
            if i <= right:
                z[i] = min(right - i + 1, z[i - left])

            while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                z[i] += 1

            if i + z[i] - 1 > right:
                left = i
                right = i + z[i] - 1

        return sum(z) + n

Implementation Explanation

We begin by creating the Z-array and initializing the matching boundaries left and right.

The variables:

left, right

represent the current Z-box, meaning a substring that is already known to match the prefix.

For every index i, we first check whether it lies inside the Z-box. If it does, we can reuse earlier computations instead of comparing characters again.

Next, we extend the match manually using the while loop. This ensures the stored Z-value becomes the longest possible prefix match.

Whenever a larger matching region is found, we update the Z-box boundaries.

Finally, we sum all Z-values and add n because the entire string matches itself completely.

Go Solution

func sumScores(s string) int64 {
	n := len(s)
	z := make([]int, n)

	left, right := 0, 0

	for i := 1; i < n; i++ {
		if i <= right {
			z[i] = min(right-i+1, z[i-left])
		}

		for i+z[i] < n && s[z[i]] == s[i+z[i]] {
			z[i]++
		}

		if i+z[i]-1 > right {
			left = i
			right = i + z[i] - 1
		}
	}

	var total int64 = int64(n)

	for _, value := range z {
		total += int64(value)
	}

	return total
}

func min(a, b int) int {
	if a < b {
		return a
	}
	return b
}

Go-specific Implementation Differences

The main difference in Go is integer handling.

The return type is:

int64

because the total score can become large.

For example:

s = "aaaaaa..."

The sum approaches:

n * (n + 1) / 2

For n = 100000, this exceeds standard 32-bit integer limits.

We also define a helper min() function because Go does not provide one for integers in the standard library.

Slices are used for the Z-array, which behave similarly to dynamic arrays in Python.

Worked Examples

Example 1

s = "babab"

Length:

n = 5
i Character z[i] Explanation left,right
1 a 0 no match (0,0)
2 b 3 "bab" matches prefix (2,4)
3 a 0 mismatch immediately (2,4)
4 b 1 "b" matches (2,4)

Final:

z = [0, 0, 3, 0, 1]

Answer:

sum(z) + n
= 4 + 5
= 9

Example 2

s = "azbazbzaz"

Length:

n = 9
i Suffix z[i]
1 zbazbzaz 0
2 bazbzaz 0
3 azbzaz 2
4 zbzaz 0
5 bzaz 0
6 zaz 0
7 az 2
8 z 0

Total:

sum(z) + n
= (0+0+2+0+0+0+2+0) + 9
= 14

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed at most a constant number of times
Space O(n) The Z-array stores one value per index

The linear time complexity comes from the fact that the right pointer never moves backward. Although the algorithm contains a while loop, every successful character comparison contributes to extending the Z-box, meaning the total work across the entire string remains linear.

The space complexity is O(n) because we store the Z-array for all positions.

Test Cases

solution = Solution()

assert solution.sumScores("babab") == 9  # Example 1
assert solution.sumScores("azbazbzaz") == 14  # Example 2

assert solution.sumScores("a") == 1  # Single character
assert solution.sumScores("aaaa") == 10  # Maximum overlap
assert solution.sumScores("abcd") == 4  # No repeated prefix

assert solution.sumScores("aa") == 3  # Small repeated string
assert solution.sumScores("ab") == 2  # Small unique string

assert solution.sumScores("ababab") == 12  # Alternating pattern
assert solution.sumScores("abcabc") == 9  # Repeating block

assert solution.sumScores("zzzzzz") == 21  # All same character
assert solution.sumScores("abcdefg") == 7  # Completely distinct

assert solution.sumScores("abaaba") == 11  # Prefix repeats later
assert solution.sumScores("mississippi") == 16  # Complex overlaps
Test Why
"babab" Validates example case with alternating matches
"azbazbzaz" Validates multiple separated matches
"a" Smallest possible input
"aaaa" Worst case overlap scenario
"abcd" No prefix reuse
"aa" Small repeated string
"ab" Small unique string
"ababab" Repeating alternating structure
"abcabc" Repeated substring block
"zzzzzz" Maximum repeated prefix matching
"abcdefg" Completely distinct characters
"abaaba" Partial prefix reuse
"mississippi" Complex repeating structure

Edge Cases

Single Character String

If:

s = "a"

there are no proper suffixes to compare.

The only string is:

"a"

Its score is 1, because the string fully matches itself.

This case is important because loops beginning at index 1 never execute. Our implementation handles it naturally by returning:

sum(z) + n

which becomes:

0 + 1 = 1

All Characters Identical

Consider:

s = "aaaaa"

Every suffix shares a long prefix with the full string:

5 + 4 + 3 + 2 + 1

This creates the worst case for naive approaches because almost every comparison succeeds.

The Z Algorithm remains efficient because it reuses previous match information through the Z-box rather than recomputing every character comparison.

No Matching Prefixes

For:

s = "abcdef"

every suffix except the full string immediately mismatches:

z = [0, 0, 0, 0, 0, 0]

Only the full string contributes to the answer.

This case ensures the implementation correctly handles immediate mismatches without incorrectly extending the Z-box.

Repeating Patterns

Strings like:

"abababab"

can be tricky because matches overlap heavily.

A naive implementation may repeatedly recompute identical comparisons.

The Z Algorithm avoids this issue by copying previously computed match lengths from inside the Z-box, preserving linear performance even for highly repetitive inputs.