LeetCode 1781 - Sum of Beauty of All Substrings

The problem asks us to calculate the sum of beauty for all substrings of a given string s. The beauty of a substring is defined as the difference between the highest frequency and the lowest frequency of any character that occurs in that substring.

LeetCode Problem 1781

Difficulty: 🟡 Medium
Topics: Hash Table, String, Counting

Solution

Problem Understanding

The problem asks us to calculate the sum of beauty for all substrings of a given string s. The beauty of a substring is defined as the difference between the highest frequency and the lowest frequency of any character that occurs in that substring. In simpler terms, for each substring, we count how many times each character appears, find the character that occurs the most and the character that occurs the least (excluding characters that do not appear), and take the difference.

The input is a string s of length between 1 and 500, consisting only of lowercase English letters. The output is a single integer representing the sum of the beauty values for every possible substring of s.

Key points to note include that single-character substrings have a beauty of 0, because the most and least frequent characters are the same. Also, substrings with all identical characters contribute 0 as well. The constraints indicate that a direct brute-force solution may be feasible due to the small input size, but we should consider optimizing where possible.

Important edge cases include very short strings (s of length 1), strings with all identical characters, and strings with maximum length where frequency calculation could be costly.

Approaches

The brute-force approach involves generating every possible substring of s, counting the frequency of each character in that substring using a hash table or array, and calculating the beauty by subtracting the minimum non-zero frequency from the maximum frequency. This approach is straightforward and guaranteed to be correct because it explicitly computes beauty for all substrings, but it is computationally expensive since there are O(n^2) substrings and computing frequencies can take O(n) for each substring, resulting in O(n^3) time complexity.

The optimized approach leverages the observation that, although there are O(n^2) substrings, the string only contains 26 lowercase letters. By fixing a starting index and expanding the substring to the right, we can maintain a running frequency array. At each expansion, we can compute the current beauty by scanning the 26-length frequency array, which is effectively O(1) relative to the substring length. This reduces the inner frequency computation from O(n) to O(26), giving an overall time complexity of O(n^2 * 26), which simplifies to O(n^2) and is efficient enough for the given constraints.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^3) O(26) Generate all substrings and count frequencies for each separately
Optimal O(n^2) O(26) Expand substrings from each start index and maintain frequency array to compute beauty incrementally

Algorithm Walkthrough

  1. Initialize a variable total_beauty to zero. This will hold the sum of beauty values of all substrings.
  2. Iterate through each possible starting index i of the substring from 0 to len(s) - 1.
  3. For each i, initialize a frequency array freq of size 26 (for lowercase letters) with all zeros.
  4. Expand the substring from index i to the right, iterating through ending indices j from i to len(s) - 1.
  5. For each character s[j], increment its corresponding count in the freq array.
  6. Compute the beauty of the current substring by finding the maximum frequency and minimum non-zero frequency in freq and take their difference.
  7. Add the computed beauty to total_beauty.
  8. After iterating through all substrings starting from all possible indices, return total_beauty.

Why it works: By expanding substrings from each starting index and maintaining a running frequency array, we efficiently compute the beauty for each substring in linear time relative to the substring count. The frequency array ensures we consider only characters present in the substring and calculate the correct maximum and minimum frequencies.

Python Solution

class Solution:
    def beautySum(self, s: str) -> int:
        total_beauty = 0
        n = len(s)
        
        for i in range(n):
            freq = [0] * 26
            for j in range(i, n):
                freq[ord(s[j]) - ord('a')] += 1
                
                max_freq = max(freq)
                min_freq = min(f for f in freq if f > 0)
                
                total_beauty += max_freq - min_freq
                
        return total_beauty

This Python implementation initializes the total beauty counter and iterates over all possible starting indices. For each starting index, it expands the substring to the right, updating a 26-length frequency array. The maximum and minimum non-zero frequencies are calculated for each substring to find the beauty, which is then added to the total.

Go Solution

func beautySum(s string) int {
    totalBeauty := 0
    n := len(s)
    
    for i := 0; i < n; i++ {
        freq := [26]int{}
        for j := i; j < n; j++ {
            freq[s[j]-'a']++
            
            maxFreq, minFreq := 0, n+1
            for _, f := range freq {
                if f > 0 {
                    if f > maxFreq {
                        maxFreq = f
                    }
                    if f < minFreq {
                        minFreq = f
                    }
                }
            }
            totalBeauty += maxFreq - minFreq
        }
    }
    
    return totalBeauty
}

In Go, we use a fixed-size array of length 26 for character frequencies. We carefully handle the minimum frequency initialization to ensure we skip zero frequencies. The algorithm logic mirrors the Python version closely, with Go-specific handling of arrays and indexing.

Worked Examples

Example 1: "aabcb"

Starting at index 0:

Substring freq max_freq min_freq beauty
"a" [1,0,..] 1 1 0
"aa" [2,0,..] 2 2 0
"aab" [2,1,..] 2 1 1
"aabc" [2,1,1,..] 2 1 1
"aabcb" [2,2,1,..] 2 1 1

Repeat similarly for starting indices 1, 2, 3, 4. Sum of all beauties = 5.

Example 2: "aabcbaa"

Following the same procedure results in total beauty = 17.

Complexity Analysis

Measure Complexity Explanation
Time O(n^2) There are O(n^2) substrings and frequency computation per substring is O(26), effectively O(1)
Space O(26) Frequency array for lowercase letters

Even though we loop over 26 characters for each substring, it is constant and does not scale with n, so the overall complexity is dominated by O(n^2).

Test Cases

# Provided examples
assert Solution().beautySum("aabcb") == 5  # Example 1
assert Solution().beautySum("aabcbaa") == 17  # Example 2

# Edge cases
assert Solution().beautySum("a") == 0  # Single character
assert Solution().beautySum("aaaaa") == 0  # All identical characters
assert Solution().beautySum("abc") == 0  # All unique characters

# Mixed characters
assert Solution().beautySum("ababa") == 4  # Beauty of substrings like "aba" is 1
assert Solution().beautySum("aaabbb") == 10  # Various substring beauties
Test Why
"aabcb" Example with mixed frequencies and non-zero beauty
"aabcbaa" Larger example with repeated characters
"a" Single-character substring, beauty = 0
"aaaaa" All identical characters, beauty = 0
"abc" All unique characters, beauty = 0
"ababa" Alternating characters, ensures running frequency works
"aaabbb" Substrings with different frequency distributions

Edge Cases

The first important edge case is a string of length 1. Any single-character substring has a beauty of 0. The implementation handles this naturally because the frequency array only contains one non-zero value, so max_freq - min_freq = 0.

The second edge case is a string where all characters are identical. Here, every substring's beauty is 0. Our approach correctly computes this because the frequency array has identical values, and max_freq - min_freq = 0.

The third edge case involves substrings where characters appear at very different frequencies. For example, "aaab" has a substring "aaab" with a:3, b:1, yielding a beauty of 2. The algorithm correctly computes max and min over non-zero frequencies, ensuring the beauty calculation accounts for such cases. This prevents miscounting when zeros are present in the frequency array.