LeetCode 2955 - Number of Same-End Substrings

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Counting, Prefix Sum

Solution

Problem Understanding

The problem asks us to compute the number of same-end substrings within specified subranges of a given string s. A substring is same-end if its first and last character are identical. We are given multiple queries in the form [li, ri], each representing a substring s[li..ri]. For each query, we must determine how many non-empty substrings of that range are same-end.

The input string can be up to 30,000 characters long, and there can be up to 30,000 queries. This means a naive solution that examines every possible substring would be too slow, as the number of substrings in a length-n string is roughly n*(n+1)/2. We need a method that scales efficiently with both the string length and the number of queries.

Edge cases include substrings of length 1, where every single character is trivially same-end, and substrings where all characters are identical. The problem guarantees valid indices for all queries.

Approaches

Brute Force

The brute-force approach would iterate over each query [li, ri], generate every possible substring within s[li..ri], and count how many have the same first and last character. While this approach is correct, it is computationally expensive. For a substring of length n, there are n*(n+1)/2 substrings. With multiple queries on a large string, the time complexity can reach O(q * n^2), which is infeasible for n and q up to 30,000.

Optimal Approach

The key insight is that for a substring to be same-end, we only need to know the frequency of each character in the range and how many times that character can pair up with itself to form substrings. Specifically, for a character c, if it occurs count_c times in the range, the number of same-end substrings starting and ending with c is the sum of the first count_c positive integers: count_c * (count_c + 1) / 2.

Using prefix sums for character counts allows us to quickly calculate count_c for any query range in O(1) per character. By maintaining a 2D prefix sum array prefix[i][ch] representing the number of occurrences of character ch up to index i, we can answer each query in O(26) time (since there are 26 lowercase English letters), making the solution scalable.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(q * n^2)	O(1)	Check all substrings individually
Optimal	O(n + q * 26)	O(n * 26)	Use prefix sums to count occurrences efficiently

Algorithm Walkthrough

Initialize a 2D array prefix of size (len(s) + 1) x 26, where prefix[i][ch] stores the number of occurrences of character ch in s[0..i-1].
Iterate through the string s. For each character at index i, update the prefix sums for all characters. prefix[i+1][ch] = prefix[i][ch] + (1 if s[i] == ch else 0).
Initialize an empty list ans to store the results for each query.
For each query [li, ri], compute the count of each character in s[li..ri] as count_c = prefix[ri+1][c] - prefix[li][c].
For each character c, add count_c * (count_c + 1) // 2 to the total for that query.
Append the total to ans.
Return ans.

Why it works: By using prefix sums, we can efficiently compute the number of times each character occurs in a given range. Counting combinations of start and end positions for each character gives the number of same-end substrings. This avoids iterating over every possible substring explicitly.

Python Solution

from typing import List

class Solution:
    def sameEndSubstringCount(self, s: str, queries: List[List[int]]) -> List[int]:
        n = len(s)
        # Initialize prefix sum array
        prefix = [[0] * 26 for _ in range(n + 1)]
        
        for i in range(n):
            for c in range(26):
                prefix[i + 1][c] = prefix[i][c]
            prefix[i + 1][ord(s[i]) - ord('a')] += 1
        
        ans = []
        for li, ri in queries:
            total = 0
            for c in range(26):
                count_c = prefix[ri + 1][c] - prefix[li][c]
                total += count_c * (count_c + 1) // 2
            ans.append(total)
        
        return ans

This Python implementation first builds the prefix sum table for all 26 letters. For each query, it computes the counts of each letter in the substring and calculates the number of same-end substrings using the combinatorial formula. This ensures that each query is answered in O(26) time.

Go Solution

func sameEndSubstringCount(s string, queries [][]int) []int {
    n := len(s)
    prefix := make([][]int, n+1)
    for i := range prefix {
        prefix[i] = make([]int, 26)
    }

    for i := 0; i < n; i++ {
        for c := 0; c < 26; c++ {
            prefix[i+1][c] = prefix[i][c]
        }
        prefix[i+1][s[i]-'a']++
    }

    ans := make([]int, len(queries))
    for idx, q := range queries {
        li, ri := q[0], q[1]
        total := 0
        for c := 0; c < 26; c++ {
            countC := prefix[ri+1][c] - prefix[li][c]
            total += countC * (countC + 1) / 2
        }
        ans[idx] = total
    }
    return ans
}

In Go, the main difference is explicit slice initialization. The prefix sum array is pre-allocated and updated similarly. Integer division in Go is safe here as all operations involve integers.

Worked Examples

Example 1: s = "abcaab", queries = [[0,0],[1,4],[2,5],[0,5]]

Query	Substring	Character Counts	Same-End Count Calculation	Total
[0,0]	"a"	{'a':1}	1*(1+1)/2 = 1	1
[1,4]	"bcaa"	{'b':1,'c':1,'a':2}	1,1,3	5
[2,5]	"caab"	{'c':1,'a':2,'b':1}	1,3,1	5
[0,5]	"abcaab"	{'a':3,'b':2,'c':1}	6,3,1	10

Example 2: s = "abcd", queries = [[0,3]]

Query	Substring	Character Counts	Same-End Count Calculation	Total
[0,3]	"abcd"	{'a':1,'b':1,'c':1,'d':1}	1+1+1+1	4

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n + q * 26)	Building prefix sums takes O(n*26), each query O(26)
Space	O(n * 26)	Prefix sum array stores 26 integers per character position

The approach efficiently balances precomputation with query evaluation, ensuring scalability to the problem limits.

Test Cases

# Basic examples
assert Solution().sameEndSubstringCount("abcaab", [[0,0],[1,4],[2,5],[0,5]]) == [1,5,5,10]
assert Solution().sameEndSubstringCount("abcd", [[0,3]]) == [4]

# Edge cases
assert Solution().sameEndSubstringCount("aa", [[0,0],[0,1]]) == [1,3]  # single letters, double letters
assert Solution().sameEndSubstringCount("abc", [[0,2]]) == [3]  # all distinct letters
assert Solution().sameEndSubstringCount("aaa", [[0,2]]) == [6]  # all same letters

# Maximum input size stress test
s = "a" * 30000
queries = [[0, 29999]]
assert Solution().sameEndSubstringCount(s, queries) == [30000*30001//2]

Test	Why
Basic examples	Validates problem statement examples
Single-letter and double-letter	Checks small ranges and trivial substrings
All distinct letters	Ensures counting works with no repeated letters
All same letters	Confirms combinatorial formula correctness
Maximum