LeetCode 2842 - Count K-Subsequences of a String With Maximum Beauty

The problem requires us to count k-subsequences of a string s that have the maximum beauty, where beauty is defined as the sum of the frequency of each character in the subsequence. A k-subsequence is a subsequence of length k where all characters are unique.

LeetCode Problem 2842

Difficulty: 🔴 Hard
Topics: Hash Table, Math, String, Greedy, Sorting, Combinatorics

Solution

Problem Understanding

The problem requires us to count k-subsequences of a string s that have the maximum beauty, where beauty is defined as the sum of the frequency of each character in the subsequence. A k-subsequence is a subsequence of length k where all characters are unique. The input string can be up to 2 * 10^5 characters, and k is between 1 and s.length.

Effectively, the problem asks us to:

  1. Compute the frequency of each character in s.
  2. Identify the subset of k distinct characters whose summed frequency is maximal.
  3. Count the number of k-subsequences that can be formed from these characters, considering all positions where those characters appear.

The main challenges are handling large input sizes efficiently and computing combinatorial counts modulo $10^9 + 7$. Edge cases include k being equal to the number of distinct characters, strings with repeated characters, and strings where all characters are unique.

Approaches

Brute-Force Approach

A naive approach would generate all subsequences of length k, compute their beauty, and track the maximum beauty. Then we would count how many subsequences have this maximum beauty. This is correct in theory but infeasible in practice. Generating all subsequences of length k from a string of length n has a combinatorial complexity of O(n choose k), which is extremely large for n = 2 * 10^5.

Optimal Approach

The optimal approach leverages the fact that beauty depends solely on character frequencies. Therefore, we do not need to generate all subsequences:

  1. Count the frequency of each character using a hash map.
  2. Sort the frequencies in descending order.
  3. Select the top k frequencies; these correspond to the characters in the maximum beauty subsequence.
  4. Count the number of ways to select indices from the original string for these characters using combinatorics, specifically combinations.
  5. Multiply the combinations for each character and take modulo $10^9 + 7$.

This approach avoids generating subsequences entirely and reduces the problem to counting and combinatorics.

Approach Time Complexity Space Complexity Notes
Brute Force O(n choose k) O(k) Generates all subsequences and calculates beauty
Optimal O(n + 26 log 26 + k) = O(n) O(26) = O(1) Frequency counting, sorting frequencies, combinatorial count

Algorithm Walkthrough

  1. Initialize a frequency map for characters a to z and count occurrences of each character in s.
  2. Extract all non-zero frequencies and sort them in descending order.
  3. Select the top k frequencies. If there are duplicates at the boundary (the kth largest frequency is repeated), identify how many characters have the same frequency.
  4. Compute the number of ways to choose positions for each character:
  • For frequencies larger than the boundary, take all positions.
  • For the boundary frequency, use combinations to select the exact number of characters needed.
  1. Multiply all these counts together modulo $10^9 + 7$ to get the final answer.

Why it works:

The algorithm relies on the invariant that maximum beauty is achieved by picking the k most frequent distinct characters. Combinatorial counting ensures we count all valid subsequences formed from these characters without generating them explicitly.

Python Solution

from collections import Counter
from math import comb

MOD = 10**9 + 7

class Solution:
    def countKSubsequencesWithMaxBeauty(self, s: str, k: int) -> int:
        freq = Counter(s)
        counts = sorted(freq.values(), reverse=True)
        
        if k > len(counts):
            return 0
        
        top_k_counts = counts[:k]
        min_count = top_k_counts[-1]
        min_count_total = counts.count(min_count)
        min_count_needed = top_k_counts.count(min_count)
        
        result = 1
        for c in top_k_counts:
            if c == min_count:
                continue
            result = result * c % MOD
        
        # For the characters at the boundary frequency, choose min_count_needed out of min_count_total
        result = result * comb(min_count_total, min_count_needed) % MOD
        
        # Multiply by the number of ways to pick positions for boundary frequency
        result = result * pow(min_count, min_count_needed, MOD) % MOD
        
        return result

Explanation:

We first count the character frequencies and sort them. We then determine the k characters contributing to maximum beauty. Characters with frequencies above the boundary are directly multiplied. For characters at the boundary, we select the exact number needed using combinatorial counts. Multiplication is done modulo $10^9 + 7$.

Go Solution

package main

import (
    "sort"
)

const MOD = 1e9 + 7

func modPow(a, b, mod int) int {
    result := 1
    a %= mod
    for b > 0 {
        if b%2 == 1 {
            result = result * a % mod
        }
        a = a * a % mod
        b /= 2
    }
    return result
}

func modComb(n, k int) int {
    if k > n {
        return 0
    }
    res := 1
    for i := 0; i < k; i++ {
        res = res * (n - i) % MOD
        res = res * modPow(i+1, MOD-2, MOD) % MOD
    }
    return res
}

func countKSubsequencesWithMaxBeauty(s string, k int) int {
    freq := make(map[rune]int)
    for _, c := range s {
        freq[c]++
    }
    
    counts := make([]int, 0, len(freq))
    for _, v := range freq {
        counts = append(counts, v)
    }
    sort.Sort(sort.Reverse(sort.IntSlice(counts)))
    
    if k > len(counts) {
        return 0
    }
    
    topK := counts[:k]
    minCount := topK[k-1]
    minTotal := 0
    minNeeded := 0
    for _, v := range counts {
        if v == minCount {
            minTotal++
        }
    }
    for _, v := range topK {
        if v == minCount {
            minNeeded++
        }
    }
    
    result := 1
    for _, v := range topK {
        if v == minCount {
            continue
        }
        result = result * v % MOD
    }
    
    result = result * modComb(minTotal, minNeeded) % MOD
    result = result * modPow(minCount, minNeeded, MOD) % MOD
    
    return result
}

Explanation:

The Go solution mirrors the Python logic. We implement modular exponentiation and modular combinations to handle large numbers. Go requires explicit type handling and does not have a built-in Counter, so we use a map. Sorting is done using sort.Sort in reverse order.

Worked Examples

Example 1: s = "bcca", k = 2

Step Counts Top K Boundary Result
Count frequencies b:1, c:2, a:1 [2,1] minCount=1 result = 1*2=2
Boundary comb minTotal=2, minNeeded=1 2 choose 1 = 2 result = 2*2 = 4

Output: 4

Example 2: s = "abbcd", k = 4

Step Counts Top K Boundary Result
Count frequencies a:1, b:2, c:1, d:1 [2,1,1,1] minCount=1 result = 2
Boundary comb minTotal=3, minNeeded=3 3 choose 3 = 1 result = 2_1_1=2

Output: 2

Complexity Analysis

Measure Complexity Explanation
Time O(n) Counting frequencies is O(n). Sorting up to 26 elements is O(26 log 26) ~ O(1). Calculating combinations is O(k).
Space O(26) = O(1) Frequency map stores at most 26 characters.

The algorithm scales linearly with input size n and uses constant extra space.

Test Cases

# Provided examples
assert Solution().countKSubsequencesWithMaxBeauty("bcca", 2) == 4  # multiple max beauty subsequences
assert Solution().countKSubsequencesWithMaxBeauty("abbcd", 4) == 2  # all top k chars used

# Edge cases
assert Solution().countKSubsequencesWithMaxBeauty("a", 1) == 1  # single char
assert Solution().countKSubsequencesWithMaxBeauty("aaabbbccc", 3) == 27  # choose one from each top frequency
assert Solution().countKSubsequencesWithMaxBeauty("abcdef", 6)