LeetCode 1338 - Reduce Array Size to The Half
Here is a complete, detailed technical solution guide for LeetCode 1338, following your requested formatting and structu
Difficulty: 🟡 Medium
Topics: Array, Hash Table, Greedy, Sorting, Heap (Priority Queue)
Solution
Here is a complete, detailed technical solution guide for LeetCode 1338, following your requested formatting and structure.
Problem Understanding
The problem asks us to reduce the size of an integer array arr by removing all occurrences of a selected set of integers. Specifically, we want to find the minimum size of such a set so that removing all numbers in the set reduces the original array by at least half.
For example, if the array length is 10, we need to remove at least 5 elements by choosing some integers to eliminate. The integers we choose can have multiple occurrences in the array, so removing a number with many occurrences is more "efficient" in reducing the array size.
The input array arr has the following properties: it has at least 2 elements, its length is even, and its values range from 1 to 105. Since the maximum array size is 10^5, any solution with worse than O(n log n) or O(n) time complexity will likely be too slow.
Important edge cases include arrays where all elements are the same, arrays with each element unique, and arrays where multiple numbers have the same frequency.
The output is a single integer representing the minimum number of distinct integers needed to remove at least half of the array.
Approaches
Brute Force Approach
A brute-force method would try every possible subset of integers in the array, remove all their occurrences, and check if the array size is reduced by at least half. Then, among all subsets that achieve this, we select the one with the smallest size.
While this approach is correct, it is computationally infeasible because the number of subsets grows exponentially with the number of distinct elements. For an array with n distinct integers, this is O(2^n), which is far too large for n up to 10^5.
Optimal Approach
The key insight is to focus on frequencies of each integer. Removing integers that appear most frequently reduces the array faster. This transforms the problem into:
- Count the occurrences of each integer.
- Sort these counts in descending order.
- Pick the most frequent integers until the sum of removed elements is at least half of the array size.
This greedy approach works because choosing numbers with smaller frequencies first cannot result in fewer numbers needed to reach half the array size, making it provably optimal.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^n) | O(n) | Check all subsets of distinct integers |
| Optimal | O(n log n) | O(n) | Count frequencies, sort them, pick largest until half is removed |
Algorithm Walkthrough
- Count Frequencies: Create a hash map where the keys are the integers from the array and the values are their frequencies. This allows quick access to how many times each integer occurs.
- Sort Frequencies: Extract the frequency counts and sort them in descending order. This ensures we can remove the largest groups first, minimizing the number of integers we need to pick.
- Greedy Removal: Initialize a variable
removedto track how many elements we have removed andcountto track how many integers we have selected. Iterate through the sorted frequency counts, adding each frequency toremovedand incrementingcount. Stop onceremovedreaches at least half the array size. - Return Result: The
countat this point is the minimum number of integers required.
Why it works: The algorithm works because removing numbers in descending order of frequency guarantees the fewest integers are needed to reach half the array size. No other combination of integers can remove the same number of elements with fewer selections.
Python Solution
from typing import List
from collections import Counter
class Solution:
def minSetSize(self, arr: List[int]) -> int:
n = len(arr)
half_size = n // 2
freq = Counter(arr)
# Sort frequencies in descending order
freq_counts = sorted(freq.values(), reverse=True)
removed = 0
count = 0
for f in freq_counts:
removed += f
count += 1
if removed >= half_size:
return count
return count
Explanation: We use Python's Counter to efficiently count occurrences. Sorting the frequencies in descending order allows the greedy selection of the most frequent elements first. The loop accumulates removed elements until reaching half the array size, and we return the number of distinct integers selected.
Go Solution
import (
"sort"
)
func minSetSize(arr []int) int {
freq := make(map[int]int)
for _, num := range arr {
freq[num]++
}
freqCounts := make([]int, 0, len(freq))
for _, count := range freq {
freqCounts = append(freqCounts, count)
}
sort.Sort(sort.Reverse(sort.IntSlice(freqCounts)))
removed := 0
count := 0
halfSize := len(arr) / 2
for _, f := range freqCounts {
removed += f
count++
if removed >= halfSize {
return count
}
}
return count
}
Explanation: Go lacks a built-in counter, so we use a map to store frequencies. Sorting requires converting to a slice and using sort.Reverse. Otherwise, the logic mirrors the Python solution.
Worked Examples
Example 1: [3,3,3,3,5,5,5,2,2,7]
| Step | Action | Removed | Count |
|---|---|---|---|
| 1 | Count frequencies | {3:4, 5:3, 2:2, 7:1} | - |
| 2 | Sort frequencies | [4,3,2,1] | - |
| 3 | Remove 3s | removed=4 | count=1 |
| 4 | Remove 5s | removed=7 >=5 | count=2 |
Output: 2
Example 2: [7,7,7,7,7,7]
| Step | Action | Removed | Count |
|---|---|---|---|
| 1 | Count frequencies | {7:6} | - |
| 2 | Sort frequencies | [6] | - |
| 3 | Remove 7s | removed=6 >=3 | count=1 |
Output: 1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Counting frequencies is O(n), sorting frequencies is O(k log k) where k ≤ n, overall O(n log n) |
| Space | O(n) | Hash map to store counts, plus frequency array |
The algorithm is efficient because it only sorts frequencies (number of distinct integers), which is always ≤ n, and performs a single linear pass to accumulate removed elements.
Test Cases
# Provided examples
assert Solution().minSetSize([3,3,3,3,5,5,5,2,2,7]) == 2 # Multiple sets possible
assert Solution().minSetSize([7,7,7,7,7,7]) == 1 # All identical
# Edge cases
assert Solution().minSetSize([1,2]) == 1 # Minimum array, half=1
assert Solution().minSetSize([1,1,2,2,3,3,4,4]) == 2 # Multiple same frequencies
assert Solution().minSetSize([i for i in range(1, 100001)] * 2) == 50000 # Large input, all elements duplicated
# Arrays with unique frequencies
assert Solution().minSetSize([1,1,1,2,2,3]) == 1 # Removing 1 reduces half
| Test | Why |
|---|---|
[3,3,3,3,5,5,5,2,2,7] |
Tests typical case with multiple frequencies |
[7,7,7,7,7,7] |
All elements identical |
[1,2] |
Minimum size array |
[1,1,2,2,3,3,4,4] |
Multiple numbers with same frequency |
| Large duplicated input | Performance test for upper bound |
Unique frequencies [1,1,1,2,2,3] |
Verifies greedy removes largest first |
Edge Cases
- All identical elements: The array can be reduced by selecting just that one number. The implementation handles this naturally because
Counterreturns one frequency equal to the array length. - All unique elements: Each element appears once. In this case, half the array must be removed by selecting half of the elements. Sorting frequencies descending still works because all frequencies are 1.
- Multiple numbers with the same frequency: Ties in frequency do not affect correctness because any selection of integers with highest frequencies works. The algorithm consistently chooses the first elements in descending frequency order, which satisfies the minimum set requirement.
- Very large arrays: Arrays near the maximum size (10^5) are handled efficiently because we avoid exponential subset enumeration and rely on counting and sorting.
- Smallest even arrays (length=2): Edge case to ensure the code returns 1 if only one element needs to be removed.
This solution guide provides a thorough