LeetCode 2083 - Substrings That Begin and End With the Same Letter

The problem asks us to count all substrings in a given string s that start and end with the same character. A substring is a contiguous sequence of characters, so the order and adjacency of characters matter.

LeetCode Problem 2083

Difficulty: 🟡 Medium
Topics: Hash Table, Math, String, Counting, Prefix Sum

Solution

Problem Understanding

The problem asks us to count all substrings in a given string s that start and end with the same character. A substring is a contiguous sequence of characters, so the order and adjacency of characters matter. The input s is a non-empty string consisting of lowercase English letters only, and its length can be as large as 10^5. The output is a single integer representing the number of substrings satisfying the "same start and end character" condition.

From the examples, we can see that all single-character substrings count because they trivially start and end with the same character. Substrings of length greater than one must be carefully counted by character frequency. The constraints guarantee that the string is never empty, so we do not need to handle an empty input case.

Important edge cases include a string of length 1 (the simplest case), a string where all characters are the same (maximal number of substrings), and a string where all characters are unique (minimal number of substrings beyond length 1).

Approaches

The brute-force approach is straightforward: iterate over all possible substrings using two nested loops, and check if the first and last character of each substring are the same. This guarantees correctness because it explicitly examines every substring. However, the time complexity is O(n²) because there are roughly n*(n+1)/2 substrings in a string of length n, and this is too slow for n up to 10^5.

The optimal approach relies on a key observation: for any character c, if it appears k times in the string, the number of substrings that begin and end with c can be calculated combinatorially. Each occurrence can pair with itself (single-character substring) and with any other occurrence later in the string. This gives a formula k + k*(k-1)/2, which simplifies to k*(k+1)/2. Using a frequency counter to track the number of occurrences of each character allows us to compute the total in a single pass over the string.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Checks all substrings explicitly
Optimal O(n) O(1) Uses character frequency counting and combinatorial formula

Algorithm Walkthrough

  1. Initialize an array freq of size 26 to store the frequency of each lowercase English letter. Each index corresponds to a letter.
  2. Iterate through each character in the string s. For each character, increment its corresponding frequency in the freq array.
  3. Initialize a variable result to 0 to accumulate the total count of valid substrings.
  4. For each frequency k in freq, compute the number of substrings that start and end with the corresponding character using the formula k*(k+1)//2. Add this value to result.
  5. Return result.

Why it works: Each character contributes k*(k+1)/2 substrings because for each occurrence, it can form substrings with itself and all subsequent occurrences. Summing over all characters counts all valid substrings without double-counting.

Python Solution

class Solution:
    def numberOfSubstrings(self, s: str) -> int:
        freq = [0] * 26  # frequency array for 'a' to 'z'
        for char in s:
            freq[ord(char) - ord('a')] += 1
        
        result = 0
        for count in freq:
            result += count * (count + 1) // 2
        
        return result

The Python implementation initializes a fixed-size array for the 26 letters to ensure O(1) space beyond the input. The ord function maps each character to an index in the array. We then iterate over the frequency array to compute the number of valid substrings per character, summing them for the final answer.

Go Solution

func numberOfSubstrings(s string) int64 {
    var freq [26]int64
    for _, c := range s {
        freq[c-'a']++
    }
    
    var result int64
    for _, count := range freq {
        result += count * (count + 1) / 2
    }
    
    return result
}

In Go, we use an array of type int64 to handle large counts safely. The rune subtraction c - 'a' gives the index for each letter. The result accumulation also uses int64 to prevent overflow for long strings.

Worked Examples

Example 1: s = "abcba"

Step freq array Substring count added result
'a' [1,0,0...] - -
'b' [1,1,0...] - -
'c' [1,1,1...] - -
'b' [1,2,1...] - -
'a' [2,2,1...] - -
compute [2,2,1...] 'a':3, 'b':3, 'c':1 7

Example 2: s = "abacad"

char freq substrings cumulative
a 3 6 6
b 1 1 7
c 1 1 8
d 1 1 9

Example 3: s = "a"

char freq substrings cumulative
a 1 1 1

Complexity Analysis

Measure Complexity Explanation
Time O(n) Single pass to compute frequencies, then constant-time pass over 26 letters
Space O(1) Fixed array of size 26, independent of n

The approach is linear in time and constant in extra space, which is optimal for the problem constraints.

Test Cases

# Provided examples
assert Solution().numberOfSubstrings("abcba") == 7  # mixed characters
assert Solution().numberOfSubstrings("abacad") == 9  # repeated characters
assert Solution().numberOfSubstrings("a") == 1  # single character

# Edge cases
assert Solution().numberOfSubstrings("aaaaa") == 15  # all same characters, maximal substrings
assert Solution().numberOfSubstrings("abcde") == 5  # all unique, only single-character substrings
assert Solution().numberOfSubstrings("ababab") == 9  # alternating characters
Test Why
"abcba" general case with repeated characters
"abacad" repeated characters at multiple positions
"a" minimal string length
"aaaaa" maximal substrings with identical characters
"abcde" no repeated characters beyond length 1
"ababab" pattern of alternating characters

Edge Cases

Single-character string: If s has length 1, the only substring is the string itself. The implementation handles this naturally since frequency of the single character is 1, yielding 1*(1+1)/2 = 1.

All characters identical: For a string like "aaaaa", the number of substrings grows quadratically. Using the combinatorial formula avoids generating substrings explicitly, preventing timeouts.

All characters unique: For a string like "abcde", only single-character substrings are valid. The formula still works because k*(k+1)/2 = 1 for each unique character. This avoids unnecessary iteration over non-existent longer substrings.

This approach correctly handles all these cases efficiently without special conditional logic.