LeetCode 3002 - Maximum Size of a Set After Removals

Difficulty: 🟡 Medium
Topics: Array, Hash Table, Greedy

Solution

Problem Understanding

The problem is asking us to maximize the number of unique elements we can have in a set after removing exactly half of the elements from two arrays nums1 and nums2. Each array is of even length n, and after removing n/2 elements from each, we insert all remaining elements into a set. Since sets automatically remove duplicates, the challenge is to remove elements in a way that preserves the largest number of distinct values.

The input arrays nums1 and nums2 can have repeated numbers, and the values can be very large, up to $10^9$. The size of each array n can be up to $2 \cdot 10^4$, which rules out approaches that require enumerating all possible removal combinations.

Important edge cases include arrays where all elements are identical, arrays where all elements are distinct, and arrays where some numbers appear in both arrays. The constraints guarantee that n is even, so removing n/2 elements is always possible.

Approaches

Brute Force

A brute-force approach would attempt to consider all subsets of n/2 elements to remove from each array and compute the size of the resulting set for every combination. This approach is correct but infeasible because the number of combinations is combinatorial: $\binom{n}{n/2} \times \binom{n}{n/2}$, which grows exponentially. Even for moderate n, this is far too slow.

Optimal Approach

The key insight is that we only care about the number of unique elements remaining. We can use frequency counting with hash maps to determine how many elements we can retain from each array. Let unique1 and unique2 be the number of distinct elements in nums1 and nums2. After removing n/2 elements, the maximum number of distinct elements we can retain from nums1 is min(unique1, n/2), and similarly for nums2. The optimal size of the set is then bounded by the sum of these values, capped at the total number of distinct elements across both arrays.

This leads to a simple formula:

$$\text{max set size} = \min(\text{total unique elements}, n/2 + n/2)$$

where total unique elements is the count of distinct numbers in nums1 ∪ nums2. This works because removing elements should target duplicates rather than unique numbers when possible.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^n)	O(n)	Enumerates all removal combinations; impractical
Optimal	O(n)	O(n)	Counts frequencies and computes maximum distinct elements efficiently

Algorithm Walkthrough

Compute the number of unique elements in nums1 and nums2 using hash sets.
Determine the total number of distinct elements across both arrays combined.
Each array will have n/2 elements removed, leaving n/2 elements in each.
The maximum number of unique elements we can take from nums1 is min(unique1, n/2) because we cannot retain more than n/2 elements after removals.
Similarly, compute the maximum from nums2 as min(unique2, n/2).
Add these two maxima together. If the sum exceeds the total number of distinct elements, cap it at total unique elements.
Return the resulting number as the maximum size of the set after removals.

Why it works:

The algorithm works because it preserves the largest number of unique elements possible from each array while respecting the removal constraints. By counting the actual distinct elements and the number of elements we can keep, we ensure no overcounting, and the cap with the total union ensures correctness in cases where overlaps exist.

Python Solution

from typing import List

class Solution:
    def maximumSetSize(self, nums1: List[int], nums2: List[int]) -> int:
        n = len(nums1)
        unique1 = len(set(nums1))
        unique2 = len(set(nums2))
        total_unique = len(set(nums1) | set(nums2))
        
        max_from_nums1 = min(unique1, n // 2)
        max_from_nums2 = min(unique2, n // 2)
        
        return min(max_from_nums1 + max_from_nums2, total_unique)

Explanation:

We first compute the number of unique elements in each array and the total unique elements across both arrays. Then, we determine the maximum number of elements that can remain after removing n/2 elements from each array. Finally, we return the smaller value between the sum of these maxima and the total distinct elements, ensuring no duplicates are overcounted.

Go Solution

func maximumSetSize(nums1 []int, nums2 []int) int {
    n := len(nums1)
    
    unique1Map := make(map[int]struct{})
    unique2Map := make(map[int]struct{})
    unionMap := make(map[int]struct{})
    
    for _, v := range nums1 {
        unique1Map[v] = struct{}{}
        unionMap[v] = struct{}{}
    }
    for _, v := range nums2 {
        unique2Map[v] = struct{}{}
        unionMap[v] = struct{}{}
    }
    
    maxFromNums1 := min(len(unique1Map), n/2)
    maxFromNums2 := min(len(unique2Map), n/2)
    
    return min(maxFromNums1 + maxFromNums2, len(unionMap))
}

func min(a, b int) int {
    if a < b {
        return a
    }
    return b
}

Go-specific notes:

We use map[int]struct{} for sets, which avoids storing unnecessary values. The union and individual counts are calculated using maps. The min function ensures we do not exceed allowed limits.

Worked Examples

Example 1:

nums1 = [1,2,1,2], nums2 = [1,1,1,1], n = 4

Unique in nums1: {1,2} → 2
Unique in nums2: {1} → 1
Total unique: {1,2} → 2
Max from nums1: min(2,2) = 2
Max from nums2: min(1,2) = 1
Max set size = min(2+1,2) = 2

Example 2:

nums1 = [1,2,3,4,5,6], nums2 = [2,3,2,3,2,3], n = 6

Unique1 = 6, Unique2 = 2, Total unique = 6
Max from nums1 = min(6,3) = 3
Max from nums2 = min(2,3) = 2
Max set size = min(3+2,6) = 5

Example 3:

nums1 = [1,1,2,2,3,3], nums2 = [4,4,5,5,6,6], n = 6

Unique1 = 3, Unique2 = 3, Total unique = 6
Max from nums1 = 3, Max from nums2 = 3
Max set size = min(3+3,6) = 6

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each element is processed once for counting and set union
Space	O(n)	We store unique elements from both arrays in hash sets

The algorithm is efficient because it only counts elements without enumerating combinations, and the space used scales linearly with the number of distinct elements.

Test Cases

# Provided examples
assert Solution().maximumSetSize([1,2,1,2], [1,1,1,1]) == 2  # Example 1
assert Solution().maximumSetSize([1,2,3,4,5,6], [2,3,2,3,2,3]) == 5  # Example 2
assert Solution().maximumSetSize([1,1,2,2,3,3], [4,4,5,5,6,6]) == 6  # Example 3

# Edge cases
assert Solution().maximumSetSize([1,1,1,1], [1,1,1,1]) == 1  # all elements same
assert Solution().maximumSetSize([1,2,3,4], [5,6,7,8]) == 4  # no overlap
assert Solution().maximumSetSize([1,2,2,3], [3,3,4,4]) == 4  # partial overlap

# Minimal input
assert Solution().maximumSetSize([1,1], [2,2]) == 2

# Maximal input stress test
assert Solution().maximumSetSize(list(range(1, 10001)) * 2, list(range(5000, 15000))) == 15000  # large arrays

Test	Why
`[1,2