LeetCode 2223 - Sum of Scores of Built Strings
The problem gives us a final string s and describes a process where the string is built by repeatedly prepending characters to the front. This means that every intermediate string si is actually a suffix of the final string s.
Difficulty: š“ Hard
Topics: String, Binary Search, Rolling Hash, Suffix Array, String Matching, Hash Function
Solution
Problem Understanding
The problem gives us a final string s and describes a process where the string is built by repeatedly prepending characters to the front. This means that every intermediate string si is actually a suffix of the final string s.
For example, if:
s = "abaca"
Then:
s1 = "a"
s2 = "ca"
s3 = "aca"
s4 = "baca"
s5 = "abaca"
Notice that each si is simply the last i characters of s.
The score of each si is defined as the length of the longest common prefix between si and the final string sn = s.
In simpler terms, for every suffix of s, we want to know:
How many characters from the beginning match the prefix of
s?
We then sum all of those scores.
Another way to think about the problem is:
For every suffix of s, compute the longest common prefix (LCP) with the full string s, then return the total.
For example:
s = "babab"
Suffixes:
"b" ā LCP = 1
"ab" ā LCP = 0
"bab" ā LCP = 3
"abab" ā LCP = 0
"babab" ā LCP = 5
Total:
1 + 0 + 3 + 0 + 5 = 9
The constraints are important:
1 <= s.length <= 10^5
A string of length 100,000 immediately tells us that quadratic algorithms will likely time out. Any approach that compares substrings character by character for every suffix will be too slow.
The problem also guarantees:
- The string contains only lowercase English letters.
- The input length is at least
1, meaning we never have an empty string. - Every intermediate string
siis a suffix ofs.
Important edge cases include strings with all identical characters, strings with no repeated prefixes, and very short inputs such as length 1. These cases can expose bugs in substring matching logic or off by one errors.
Approaches
Brute Force Approach
The most straightforward idea is to generate every suffix and compare it character by character with the original string.
For each suffix starting at index i:
- Compare
s[0]withs[i] - Compare
s[1]withs[i+1] - Continue until characters differ or one string ends
- Add the matched length to the answer
For example:
s = "babab"
At index 2:
s = "babab"
suffix = "bab"
Compare:
b == b
a == a
b == b
Score = 3.
This works because it explicitly computes the required longest common prefix for every suffix.
However, the worst case becomes extremely expensive.
Consider:
s = "aaaaaaaaaa..."
Every suffix matches almost the entire prefix.
Work becomes:
n + (n-1) + (n-2) + ...
Which is:
O(n²)
For n = 100,000, this is far too slow.
Optimal Approach, Z Algorithm
The key observation is that the problem is secretly asking:
For every suffix, find the longest prefix match with the whole string.
This is exactly what the Z Algorithm computes.
The Z-array for a string stores:
z[i] = longest substring starting at i
that matches the prefix of s
For example:
s = "babab"
Z-array:
[0, 0, 3, 0, 1]
Meaning:
index 0 ā ignored
index 1 ā match length 0
index 2 ā "bab" matches prefix
index 3 ā match length 0
index 4 ā "b" matches prefix
The full string itself contributes:
len(s)
So the answer becomes:
sum(z) + n
The Z Algorithm computes all prefix matches in linear time, making it ideal for 10^5 characters.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Compare every suffix character by character |
| Optimal, Z Algorithm | O(n) | O(n) | Computes longest prefix match for all suffixes in one pass |
Algorithm Walkthrough
We use the Z Algorithm to compute prefix matches efficiently.
Step 1: Initialize the Z-array
Create an array:
z = [0] * n
Where:
z[i]
stores the longest common prefix between:
s
and:
s[i:]
We also maintain a matching window:
[left, right]
This window represents the current substring known to match the prefix.
Step 2: Iterate through the string
For every index:
i from 1 to n-1
we compute z[i].
Index 0 is skipped because the full string trivially matches itself.
Step 3: Reuse previous work inside the Z-box
If:
i <= right
then we are inside an already matched region.
We can reuse previous results:
z[i] = min(right - i + 1, z[i - left])
This avoids redundant comparisons.
The idea is:
Since we already know a segment matches the prefix, we can copy information from an earlier position.
Step 4: Extend the match
After reusing information, we attempt to extend the match further:
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
This continues character comparisons only when necessary.
Step 5: Update the matching window
If the new match extends past right:
i + z[i] - 1 > right
update:
left = i
right = i + z[i] - 1
This creates a larger reusable matching region.
Step 6: Compute the answer
The Z-array stores scores for every suffix except the full string itself.
Since:
s == s
the final string contributes:
n
Return:
sum(z) + n
Why it works
The Z Algorithm guarantees that z[i] equals the longest common prefix between the string prefix and the suffix beginning at i. Since every si is a suffix of s, each required score is exactly one Z-value. The only missing case is the entire string itself, whose score equals n. Summing all values therefore produces the correct total score.
Python Solution
class Solution:
def sumScores(self, s: str) -> int:
n = len(s)
z = [0] * n
left = 0
right = 0
for i in range(1, n):
if i <= right:
z[i] = min(right - i + 1, z[i - left])
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
if i + z[i] - 1 > right:
left = i
right = i + z[i] - 1
return sum(z) + n
Implementation Explanation
We begin by creating the Z-array and initializing the matching boundaries left and right.
The variables:
left, right
represent the current Z-box, meaning a substring that is already known to match the prefix.
For every index i, we first check whether it lies inside the Z-box. If it does, we can reuse earlier computations instead of comparing characters again.
Next, we extend the match manually using the while loop. This ensures the stored Z-value becomes the longest possible prefix match.
Whenever a larger matching region is found, we update the Z-box boundaries.
Finally, we sum all Z-values and add n because the entire string matches itself completely.
Go Solution
func sumScores(s string) int64 {
n := len(s)
z := make([]int, n)
left, right := 0, 0
for i := 1; i < n; i++ {
if i <= right {
z[i] = min(right-i+1, z[i-left])
}
for i+z[i] < n && s[z[i]] == s[i+z[i]] {
z[i]++
}
if i+z[i]-1 > right {
left = i
right = i + z[i] - 1
}
}
var total int64 = int64(n)
for _, value := range z {
total += int64(value)
}
return total
}
func min(a, b int) int {
if a < b {
return a
}
return b
}
Go-specific Implementation Differences
The main difference in Go is integer handling.
The return type is:
int64
because the total score can become large.
For example:
s = "aaaaaa..."
The sum approaches:
n * (n + 1) / 2
For n = 100000, this exceeds standard 32-bit integer limits.
We also define a helper min() function because Go does not provide one for integers in the standard library.
Slices are used for the Z-array, which behave similarly to dynamic arrays in Python.
Worked Examples
Example 1
s = "babab"
Length:
n = 5
| i | Character | z[i] | Explanation | left,right |
|---|---|---|---|---|
| 1 | a | 0 | no match | (0,0) |
| 2 | b | 3 | "bab" matches prefix | (2,4) |
| 3 | a | 0 | mismatch immediately | (2,4) |
| 4 | b | 1 | "b" matches | (2,4) |
Final:
z = [0, 0, 3, 0, 1]
Answer:
sum(z) + n
= 4 + 5
= 9
Example 2
s = "azbazbzaz"
Length:
n = 9
| i | Suffix | z[i] |
|---|---|---|
| 1 | zbazbzaz | 0 |
| 2 | bazbzaz | 0 |
| 3 | azbzaz | 2 |
| 4 | zbzaz | 0 |
| 5 | bzaz | 0 |
| 6 | zaz | 0 |
| 7 | az | 2 |
| 8 | z | 0 |
Total:
sum(z) + n
= (0+0+2+0+0+0+2+0) + 9
= 14
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each character is processed at most a constant number of times |
| Space | O(n) | The Z-array stores one value per index |
The linear time complexity comes from the fact that the right pointer never moves backward. Although the algorithm contains a while loop, every successful character comparison contributes to extending the Z-box, meaning the total work across the entire string remains linear.
The space complexity is O(n) because we store the Z-array for all positions.
Test Cases
solution = Solution()
assert solution.sumScores("babab") == 9 # Example 1
assert solution.sumScores("azbazbzaz") == 14 # Example 2
assert solution.sumScores("a") == 1 # Single character
assert solution.sumScores("aaaa") == 10 # Maximum overlap
assert solution.sumScores("abcd") == 4 # No repeated prefix
assert solution.sumScores("aa") == 3 # Small repeated string
assert solution.sumScores("ab") == 2 # Small unique string
assert solution.sumScores("ababab") == 12 # Alternating pattern
assert solution.sumScores("abcabc") == 9 # Repeating block
assert solution.sumScores("zzzzzz") == 21 # All same character
assert solution.sumScores("abcdefg") == 7 # Completely distinct
assert solution.sumScores("abaaba") == 11 # Prefix repeats later
assert solution.sumScores("mississippi") == 16 # Complex overlaps
| Test | Why |
|---|---|
"babab" |
Validates example case with alternating matches |
"azbazbzaz" |
Validates multiple separated matches |
"a" |
Smallest possible input |
"aaaa" |
Worst case overlap scenario |
"abcd" |
No prefix reuse |
"aa" |
Small repeated string |
"ab" |
Small unique string |
"ababab" |
Repeating alternating structure |
"abcabc" |
Repeated substring block |
"zzzzzz" |
Maximum repeated prefix matching |
"abcdefg" |
Completely distinct characters |
"abaaba" |
Partial prefix reuse |
"mississippi" |
Complex repeating structure |
Edge Cases
Single Character String
If:
s = "a"
there are no proper suffixes to compare.
The only string is:
"a"
Its score is 1, because the string fully matches itself.
This case is important because loops beginning at index 1 never execute. Our implementation handles it naturally by returning:
sum(z) + n
which becomes:
0 + 1 = 1
All Characters Identical
Consider:
s = "aaaaa"
Every suffix shares a long prefix with the full string:
5 + 4 + 3 + 2 + 1
This creates the worst case for naive approaches because almost every comparison succeeds.
The Z Algorithm remains efficient because it reuses previous match information through the Z-box rather than recomputing every character comparison.
No Matching Prefixes
For:
s = "abcdef"
every suffix except the full string immediately mismatches:
z = [0, 0, 0, 0, 0, 0]
Only the full string contributes to the answer.
This case ensures the implementation correctly handles immediate mismatches without incorrectly extending the Z-box.
Repeating Patterns
Strings like:
"abababab"
can be tricky because matches overlap heavily.
A naive implementation may repeatedly recompute identical comparisons.
The Z Algorithm avoids this issue by copying previously computed match lengths from inside the Z-box, preserving linear performance even for highly repetitive inputs.