LeetCode 3784 - Minimum Deletion Cost to Make All Characters Equal

The problem asks us to minimize the total cost of deleting characters from a string such that the resulting string contains only one unique character.

LeetCode Problem 3784

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Enumeration

Solution

Problem Understanding

The problem asks us to minimize the total cost of deleting characters from a string such that the resulting string contains only one unique character. We are given a string s of length n and an array cost of the same length, where cost[i] is the cost of deleting the character s[i]. The goal is to compute the minimum sum of costs required to delete characters so that all remaining characters in the string are identical.

In other words, we want to "collapse" each group of identical consecutive characters to a single character while paying the minimal deletion cost. We are guaranteed that the input size can be up to 105 characters, and each deletion cost can be as high as 10^9. This means a naive solution that checks all possible combinations of deletions would be too slow.

Important edge cases include:

  1. The string is already uniform, like "zzzzz", where the cost should be 0.
  2. Characters alternate, like "abab", where deletions must remove almost half the characters.
  3. Cost values vary significantly, so we must carefully choose which characters to delete within consecutive duplicates.

Approaches

Brute Force

A brute-force approach would consider every possible subset of deletions, keeping track of whether the resulting string contains only one unique character and calculating the total cost. While this would yield the correct answer, the time complexity is exponential O(2^n) because there are 2^n subsets of characters, making it completely infeasible for n up to 105.

Optimal Approach

The key insight is that we only need to consider consecutive identical characters, because non-consecutive identical characters can always coexist in the final string without conflicts. Within a sequence of consecutive identical characters, to make all characters equal while minimizing deletion cost, we should keep the character with the maximum deletion cost and delete the rest. This ensures we pay the minimal cost for deletions in that sequence.

We can iterate through the string once, maintaining a running group of consecutive identical characters, sum their deletion costs, and subtract the maximum cost in that group to get the minimal deletion cost for that segment. Summing these costs across the string gives the final answer.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^n) O(n) Check all subsets of deletions, too slow
Optimal O(n) O(1) Iterate through string once, handle consecutive duplicates greedily

Algorithm Walkthrough

  1. Initialize a variable total_cost to 0 to store the accumulated minimal deletion cost.
  2. Start iterating through the string with an index i.
  3. For each consecutive group of identical characters, initialize sum_cost to 0 and max_cost to 0.
  4. While the current character is the same as the next character, add its deletion cost to sum_cost and update max_cost if this character’s cost is larger.
  5. Once the end of the consecutive group is reached, compute the deletion cost for this group as sum_cost - max_cost and add it to total_cost.
  6. Continue iterating until the end of the string.
  7. Return total_cost.

Why it works: By always keeping the character with the highest deletion cost in a consecutive group and deleting the rest, we guarantee minimal cost for that group. Processing the string sequentially ensures that all groups are handled independently and efficiently, leading to the overall minimal deletion cost.

Python Solution

from typing import List

class Solution:
    def minCost(self, s: str, cost: List[int]) -> int:
        total_cost = 0
        n = len(s)
        i = 0
        
        while i < n:
            current_char = s[i]
            sum_cost = cost[i]
            max_cost = cost[i]
            j = i + 1
            
            while j < n and s[j] == current_char:
                sum_cost += cost[j]
                max_cost = max(max_cost, cost[j])
                j += 1
            
            total_cost += sum_cost - max_cost
            i = j
        
        return total_cost

The Python code initializes a running total of deletion costs, then iterates over consecutive groups of identical characters. For each group, it calculates the total cost of deleting all characters except the one with the maximum deletion cost. The outer loop ensures all characters are processed.

Go Solution

func minCost(s string, cost []int) int64 {
    var totalCost int64 = 0
    n := len(s)
    i := 0
    
    for i < n {
        currentChar := s[i]
        sumCost := int64(cost[i])
        maxCost := int64(cost[i])
        j := i + 1
        
        for j < n && s[j] == currentChar {
            sumCost += int64(cost[j])
            if int64(cost[j]) > maxCost {
                maxCost = int64(cost[j])
            }
            j++
        }
        
        totalCost += sumCost - maxCost
        i = j
    }
    
    return totalCost
}

In Go, we handle integer overflow by using int64 since individual costs can be up to 10^9 and the total sum could exceed the 32-bit integer limit. The logic mirrors the Python solution.

Worked Examples

Example 1: s = "aabaac", cost = [1,2,3,4,1,10]

Step i current_char sum_cost max_cost total_cost
1 0 a 1 1 0
2 1 a 3 2 1 (sum-cost - max = 3-2)
3 3 b 4 4 1 (4-4=0 added)
4 4 a 5 4 2 (5-4 added)
5 5 c 10 10 11 (10-10 added)

Example 2: s = "abc", cost = [10,5,8]

Each character is unique, so we must delete all except the largest cost (10). Deletion cost = 5 + 8 = 13.

Example 3: s = "zzzzz", cost = [67,67,67,67,67]

All characters identical, deletion cost = 0.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Iterate through the string once, processing each character exactly once.
Space O(1) Only constant extra variables are used for tracking sums and max costs.

The linear time complexity is optimal because every character must be examined at least once. Space remains constant because no extra arrays or hash maps are needed.

Test Cases

# test cases
assert Solution().minCost("aabaac", [1,2,3,4,1,10]) == 11  # example 1
assert Solution().minCost("abc", [10,5,8]) == 13           # example 2
assert Solution().minCost("zzzzz", [67,67,67,67,67]) == 0 # example 3
assert Solution().minCost("a", [100]) == 0                 # single character, no deletion needed
assert Solution().minCost("aa", [1,2]) == 1               # two identical, delete smaller cost
assert Solution().minCost("ab", [5,10]) == 5              # two different, delete smaller cost
assert Solution().minCost("aaabbb", [1,2,3,4,5,6]) == 12  # multiple groups, sum of deletions
Test Why
"aabaac" Consecutive duplicates with mixed costs
"abc" All unique characters, must delete to leave one
"zzzzz" Already uniform, minimal cost = 0
"a" Single character, edge case
"aa" Two identical characters, test keeping max cost
"ab" Two different characters, test basic deletion
"aaabbb" Multiple consecutive groups, sum deletions for each

Edge Cases

  1. Single character string: "a" with cost [100]. The algorithm must return 0 because there is nothing to delete. This tests handling minimal input sizes and prevents indexing errors.
  2. All characters identical with varying costs: "aaaa" with cost [1,100,1,1]. The algorithm must correctly identify the maximum cost in the group (100) and delete the rest, yielding a total cost of 3.
  3. Alternating characters: "ababab" with varying costs [1,2,3,4,5,6]. Each character is a consecutive group of size 1. The algorithm ensures that for each unique character sequence, it keeps the highest cost in that sequence (in this case, all single characters), and total deletion costs are calculated correctly. This tests the correctness when no consecutive duplicates exist.

This implementation handles all edge cases by processing groups of consecutive characters, using the maximum deletion cost strategy, and iterating in a single pass with constant extra space.