LeetCode 2262 - Total Appeal of A String

The problem asks us to calculate the total appeal of all substrings of a given string s. The appeal of a string is defined as the number of distinct characters it contains.

LeetCode Problem 2262

Difficulty: 🔴 Hard
Topics: Hash Table, String, Dynamic Programming

Solution

Problem Understanding

The problem asks us to calculate the total appeal of all substrings of a given string s. The appeal of a string is defined as the number of distinct characters it contains. In other words, for every possible substring of s, we need to count how many unique letters it has and sum these counts across all substrings.

The input is a single string s consisting of lowercase English letters, and its length can be up to 10^5. The output is a single integer representing the total sum of appeals for all substrings.

Key points to understand:

  • A substring is a contiguous sequence of characters. For example, "abc" has substrings "a", "b", "c", "ab", "bc", "abc".
  • A naive approach that generates all substrings and counts distinct characters will be too slow because there can be roughly O(n^2) substrings for a string of length n, and counting distinct characters in each substring takes additional time.
  • Edge cases include strings with all identical characters, single-character strings, and strings where every character is unique.

The goal is to find an efficient approach that avoids explicitly generating all substrings.

Approaches

Brute Force Approach

The brute force method would generate every substring of s and compute its appeal by using a set to track distinct characters. This method is correct because it directly counts the appeal for all substrings. However, its complexity is O(n^3) in the worst case: there are O(n^2) substrings and counting unique characters in each substring takes up to O(n).

Optimal Approach

The optimal solution leverages the insight that we can calculate the contribution of each character directly to all substrings that include it. Specifically, if a character c appears at index i, then it contributes to all substrings that start after the previous occurrence of c and end at or after i. Using a hash map to track the last occurrence of each character allows us to compute the total contribution in linear time. This reduces the complexity to O(n).

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(n) Generates all substrings and counts distinct characters
Optimal O(n) O(1) Tracks last occurrence of each character to compute contributions

Algorithm Walkthrough

  1. Initialize a hash map last_occurrence to store the last index where each character appeared. Initialize total_appeal to 0 and current_sum to 0.
  2. Iterate over each character c at index i in the string.
  3. Compute the number of new substrings that include c as a new distinct character: this is (i - last_occurrence[c]).
  4. Update current_sum by adding the number of new substrings. current_sum represents the total appeal of all substrings ending at index i.
  5. Update last_occurrence[c] to i.
  6. Add current_sum to total_appeal.
  7. After iterating through the string, total_appeal contains the total appeal of all substrings.

Why it works

The key invariant is that current_sum always reflects the sum of appeals for all substrings ending at the current index. By adding contributions from new distinct occurrences, we account for every substring exactly once. Tracking the last occurrence ensures that repeated characters do not inflate the count incorrectly.

Python Solution

class Solution:
    def appealSum(self, s: str) -> int:
        last_occurrence = {}
        total_appeal = 0
        current_sum = 0
        
        for i, c in enumerate(s):
            prev_index = last_occurrence.get(c, -1)
            current_sum += (i - prev_index)
            last_occurrence[c] = i
            total_appeal += current_sum
            
        return total_appeal

Explanation

We use last_occurrence to track the last index of each character. For each character at index i, the number of substrings where it contributes as a new distinct character is (i - prev_index). We accumulate this in current_sum which represents the sum of appeals of all substrings ending at i. Adding current_sum to total_appeal at each step gives the final result.

Go Solution

func appealSum(s string) int64 {
    lastOccurrence := [26]int{}
    for i := range lastOccurrence {
        lastOccurrence[i] = -1
    }
    
    var totalAppeal int64 = 0
    var currentSum int64 = 0
    
    for i, c := range s {
        index := c - 'a'
        prevIndex := lastOccurrence[index]
        currentSum += int64(i - prevIndex)
        lastOccurrence[index] = i
        totalAppeal += currentSum
    }
    
    return totalAppeal
}

Explanation

Go uses a fixed-size array lastOccurrence for the 26 lowercase letters. We initialize all entries to -1 to handle characters that have not appeared yet. The rest of the logic mirrors the Python version, with int64 used to prevent overflow on large strings.

Worked Examples

Example 1: s = "abbca"

i char prev_index i - prev_index current_sum total_appeal
0 a -1 1 1 1
1 b -1 2 3 4
2 b 1 1 4 8
3 c -1 4 8 16
4 a 0 4 12 28

Final output is 28.

Example 2: s = "code"

i char prev_index i - prev_index current_sum total_appeal
0 c -1 1 1 1
1 o -1 2 3 4
2 d -1 3 6 10
3 e -1 4 10 20

Final output is 20.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Single pass through the string, with O(1) operations per character
Space O(1) Hash map of 26 characters, independent of input size

Because we only need to track the last occurrence of each character, space usage does not scale with the string length.

Test Cases

# Provided examples
assert Solution().appealSum("abbca") == 28  # Example 1
assert Solution().appealSum("code") == 20   # Example 2

# Edge cases
assert Solution().appealSum("a") == 1       # Single character
assert Solution().appealSum("aaaa") == 10   # All characters identical
assert Solution().appealSum("abcde") == 35  # All characters distinct
assert Solution().appealSum("abac") == 17   # Repeated characters interleaved

# Stress test
assert Solution().appealSum("a"*100000) == 5000050000  # Large identical characters
Test Why
"abbca" Validates standard example with repeated characters
"code" Validates example with all unique letters
"a" Single-character edge case
"aaaa" Repeated identical characters edge case
"abcde" All distinct letters edge case
"abac" Interleaved repeats
"a"*100000 Stress test for performance and integer handling

Edge Cases

Single Character String

A string with one character should return an appeal of 1, because there is only one substring. The algorithm correctly handles this because the last_occurrence default ensures (i - prev_index) is computed as 1.

All Characters Identical

When the string contains only one unique character repeated, each new character contributes only once per new substring that ends at its position. The algorithm calculates contributions linearly and sums correctly without double counting.

Interleaved Repeated Characters

For a string like "abac", repeated characters are separated by other characters. Tracking the last occurrence ensures that the contribution for a repeated character only accounts for substrings where it is newly distinct, correctly adjusting current_sum.