LeetCode 2083 - Substrings That Begin and End With the Same Letter
The problem asks us to count all substrings in a given string s that start and end with the same character. A substring is a contiguous sequence of characters, so the order and adjacency of characters matter.
Difficulty: 🟡 Medium
Topics: Hash Table, Math, String, Counting, Prefix Sum
Solution
Problem Understanding
The problem asks us to count all substrings in a given string s that start and end with the same character. A substring is a contiguous sequence of characters, so the order and adjacency of characters matter. The input s is a non-empty string consisting of lowercase English letters only, and its length can be as large as 10^5. The output is a single integer representing the number of substrings satisfying the "same start and end character" condition.
From the examples, we can see that all single-character substrings count because they trivially start and end with the same character. Substrings of length greater than one must be carefully counted by character frequency. The constraints guarantee that the string is never empty, so we do not need to handle an empty input case.
Important edge cases include a string of length 1 (the simplest case), a string where all characters are the same (maximal number of substrings), and a string where all characters are unique (minimal number of substrings beyond length 1).
Approaches
The brute-force approach is straightforward: iterate over all possible substrings using two nested loops, and check if the first and last character of each substring are the same. This guarantees correctness because it explicitly examines every substring. However, the time complexity is O(n²) because there are roughly n*(n+1)/2 substrings in a string of length n, and this is too slow for n up to 10^5.
The optimal approach relies on a key observation: for any character c, if it appears k times in the string, the number of substrings that begin and end with c can be calculated combinatorially. Each occurrence can pair with itself (single-character substring) and with any other occurrence later in the string. This gives a formula k + k*(k-1)/2, which simplifies to k*(k+1)/2. Using a frequency counter to track the number of occurrences of each character allows us to compute the total in a single pass over the string.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Checks all substrings explicitly |
| Optimal | O(n) | O(1) | Uses character frequency counting and combinatorial formula |
Algorithm Walkthrough
- Initialize an array
freqof size 26 to store the frequency of each lowercase English letter. Each index corresponds to a letter. - Iterate through each character in the string
s. For each character, increment its corresponding frequency in thefreqarray. - Initialize a variable
resultto 0 to accumulate the total count of valid substrings. - For each frequency
kinfreq, compute the number of substrings that start and end with the corresponding character using the formulak*(k+1)//2. Add this value toresult. - Return
result.
Why it works: Each character contributes k*(k+1)/2 substrings because for each occurrence, it can form substrings with itself and all subsequent occurrences. Summing over all characters counts all valid substrings without double-counting.
Python Solution
class Solution:
def numberOfSubstrings(self, s: str) -> int:
freq = [0] * 26 # frequency array for 'a' to 'z'
for char in s:
freq[ord(char) - ord('a')] += 1
result = 0
for count in freq:
result += count * (count + 1) // 2
return result
The Python implementation initializes a fixed-size array for the 26 letters to ensure O(1) space beyond the input. The ord function maps each character to an index in the array. We then iterate over the frequency array to compute the number of valid substrings per character, summing them for the final answer.
Go Solution
func numberOfSubstrings(s string) int64 {
var freq [26]int64
for _, c := range s {
freq[c-'a']++
}
var result int64
for _, count := range freq {
result += count * (count + 1) / 2
}
return result
}
In Go, we use an array of type int64 to handle large counts safely. The rune subtraction c - 'a' gives the index for each letter. The result accumulation also uses int64 to prevent overflow for long strings.
Worked Examples
Example 1: s = "abcba"
| Step | freq array | Substring count added | result |
|---|---|---|---|
| 'a' | [1,0,0...] | - | - |
| 'b' | [1,1,0...] | - | - |
| 'c' | [1,1,1...] | - | - |
| 'b' | [1,2,1...] | - | - |
| 'a' | [2,2,1...] | - | - |
| compute | [2,2,1...] | 'a':3, 'b':3, 'c':1 | 7 |
Example 2: s = "abacad"
| char | freq | substrings | cumulative |
|---|---|---|---|
| a | 3 | 6 | 6 |
| b | 1 | 1 | 7 |
| c | 1 | 1 | 8 |
| d | 1 | 1 | 9 |
Example 3: s = "a"
| char | freq | substrings | cumulative |
|---|---|---|---|
| a | 1 | 1 | 1 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Single pass to compute frequencies, then constant-time pass over 26 letters |
| Space | O(1) | Fixed array of size 26, independent of n |
The approach is linear in time and constant in extra space, which is optimal for the problem constraints.
Test Cases
# Provided examples
assert Solution().numberOfSubstrings("abcba") == 7 # mixed characters
assert Solution().numberOfSubstrings("abacad") == 9 # repeated characters
assert Solution().numberOfSubstrings("a") == 1 # single character
# Edge cases
assert Solution().numberOfSubstrings("aaaaa") == 15 # all same characters, maximal substrings
assert Solution().numberOfSubstrings("abcde") == 5 # all unique, only single-character substrings
assert Solution().numberOfSubstrings("ababab") == 9 # alternating characters
| Test | Why |
|---|---|
| "abcba" | general case with repeated characters |
| "abacad" | repeated characters at multiple positions |
| "a" | minimal string length |
| "aaaaa" | maximal substrings with identical characters |
| "abcde" | no repeated characters beyond length 1 |
| "ababab" | pattern of alternating characters |
Edge Cases
Single-character string: If s has length 1, the only substring is the string itself. The implementation handles this naturally since frequency of the single character is 1, yielding 1*(1+1)/2 = 1.
All characters identical: For a string like "aaaaa", the number of substrings grows quadratically. Using the combinatorial formula avoids generating substrings explicitly, preventing timeouts.
All characters unique: For a string like "abcde", only single-character substrings are valid. The formula still works because k*(k+1)/2 = 1 for each unique character. This avoids unnecessary iteration over non-existent longer substrings.
This approach correctly handles all these cases efficiently without special conditional logic.