LeetCode 3134 - Find the Median of the Uniqueness Array
The problem defines a special array called the uniqueness array. For every possible subarray of nums, we compute how many distinct values appear inside that subarray. We then collect all of those distinct counts into a single array and sort it in non decreasing order.
Difficulty: š“ Hard
Topics: Array, Hash Table, Binary Search, Sliding Window
Solution
Problem Understanding
The problem defines a special array called the uniqueness array. For every possible subarray of nums, we compute how many distinct values appear inside that subarray. We then collect all of those distinct counts into a single array and sort it in non decreasing order. The task is to return the median value from this sorted array.
Suppose nums = [1,2,3]. The subarrays are:
[1]ā 1 distinct value[2]ā 1 distinct value[3]ā 1 distinct value[1,2]ā 2 distinct values[2,3]ā 2 distinct values[1,2,3]ā 3 distinct values
So the uniqueness array becomes [1,1,1,2,2,3]. Since there are 6 elements, the problem asks for the smaller middle element. The two middle elements are at indices 2 and 3, which are 1 and 2, so the answer is 1.
The input size is very important here. The array length can be as large as 10^5. The number of subarrays of an array of length n is:
$\frac{n(n+1)}{2}$
For n = 10^5, this is roughly 5 * 10^9 subarrays. This immediately tells us that explicitly generating all subarrays or all distinct counts is impossible.
The problem therefore requires a more mathematical and indirect approach. Instead of constructing the uniqueness array, we need to determine its median using counting techniques.
Several edge cases are important:
- Arrays where all elements are identical. Every subarray has exactly one distinct element.
- Arrays where all elements are unique. Distinct counts grow with subarray length.
- Very large arrays, where even quadratic algorithms are too slow.
- Even sized uniqueness arrays, where the smaller middle value must be chosen.
The constraints strongly suggest an algorithm around O(n log n).
Approaches
Brute Force Approach
The most direct solution is to enumerate every subarray and compute the number of distinct elements in it.
For each starting index i, we expand the ending index j and maintain a hash set of seen values. Each subarray contributes one distinct count to the uniqueness array. After generating all counts, we sort them and return the median.
This approach is correct because it exactly follows the problem definition.
However, the number of subarrays is:
$\frac{n(n+1)}{2}$
That is already too large for n = 10^5. Additionally, sorting billions of values is impossible within time limits.
Optimal Approach
The key observation is that we do not actually need the entire uniqueness array. We only need its median.
This transforms the problem into a selection problem. Instead of generating all distinct counts, we binary search on the answer.
Suppose we ask:
How many subarrays have at most
kdistinct elements?
If we can compute this efficiently, then we can determine whether the median is less than or equal to k.
This works because the uniqueness array is sorted conceptually. If at least half of the subarrays have distinct count <= k, then the median must also be <= k.
The remaining challenge becomes efficiently counting subarrays with at most k distinct elements. This is a classic sliding window problem.
Using a two pointer sliding window and a frequency map:
- Expand the right pointer.
- Shrink the left pointer whenever the number of distinct elements exceeds
k. - For every right endpoint, count how many valid subarrays end there.
This gives an O(n) counting procedure.
We then binary search over possible distinct counts from 1 to n.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n² log n) or worse | O(n²) | Explicitly generates all distinct counts |
| Optimal | O(n log n) | O(n) | Binary search plus sliding window counting |
Algorithm Walkthrough
- Compute the total number of subarrays.
The number of subarrays in an array of length n is:
$\frac{n(n+1)}{2}$
Let this value be total.
2. Determine the median position.
Since the problem asks for the smaller middle value when there are two medians, we use:
target = (total + 1) // 2
This gives the 1 indexed position of the median in the sorted uniqueness array. 3. Binary search on the answer.
The minimum possible distinct count is 1.
The maximum possible distinct count is the number of unique values in the entire array, at most n.
We binary search for the smallest k such that:
count_at_most(k) >= target
- Implement
count_at_most(k)using sliding window.
We maintain:
leftpointerrightpointer- frequency hash map
- current number of distinct elements
- Expand the window.
For each right index:
- Add
nums[right]to the frequency map. - If this value appears for the first time, increase distinct count.
- Shrink the window if needed.
While the number of distinct elements exceeds k:
- Remove
nums[left] - Decrease its frequency
- If frequency becomes zero, decrease distinct count
- Move
leftforward
- Count valid subarrays.
Once the window satisfies the condition, every subarray ending at right and starting from any index in [left, right] is valid.
The number of such subarrays is:
right - left + 1
Add this to the running total.
8. Use binary search to find the smallest valid k.
If count_at_most(mid) is at least the median position, move left in binary search.
Otherwise move right. 9. Return the final binary search result.
Why it works
The sliding window correctly counts all subarrays with at most k distinct elements because the window always maintains the maximal valid range ending at each right index.
Binary search works because the predicate is monotonic:
- If at least
targetsubarrays have distinct count<= k, - then the same is true for every larger value.
Therefore, the smallest valid k is exactly the median of the uniqueness array.
Python Solution
from collections import defaultdict
from typing import List
class Solution:
def medianOfUniquenessArray(self, nums: List[int]) -> int:
n = len(nums)
total_subarrays = n * (n + 1) // 2
target = (total_subarrays + 1) // 2
def count_at_most(k: int) -> int:
freq = defaultdict(int)
left = 0
distinct = 0
total = 0
for right in range(n):
value = nums[right]
if freq[value] == 0:
distinct += 1
freq[value] += 1
while distinct > k:
left_value = nums[left]
freq[left_value] -= 1
if freq[left_value] == 0:
distinct -= 1
left += 1
total += right - left + 1
return total
left = 1
right = len(set(nums))
answer = right
while left <= right:
mid = (left + right) // 2
if count_at_most(mid) >= target:
answer = mid
right = mid - 1
else:
left = mid + 1
return answer
The implementation starts by computing the total number of subarrays and the target median position.
The helper function count_at_most(k) is the core of the algorithm. It uses a sliding window to count how many subarrays contain at most k distinct values.
The freq hash map stores how many times each value appears inside the current window. The variable distinct tracks the number of unique values currently inside the window.
As the right pointer expands, the window may become invalid by containing too many distinct values. The inner while loop shrinks the window from the left until the condition is restored.
For every valid window ending at right, all starting positions between left and right produce valid subarrays. This contributes right - left + 1 new subarrays.
The binary search repeatedly checks whether enough subarrays satisfy the distinct limit. The first valid k is the median.
Go Solution
func medianOfUniquenessArray(nums []int) int {
n := len(nums)
totalSubarrays := n * (n + 1) / 2
target := (totalSubarrays + 1) / 2
countAtMost := func(k int) int {
freq := make(map[int]int)
left := 0
distinct := 0
total := 0
for right := 0; right < n; right++ {
value := nums[right]
if freq[value] == 0 {
distinct++
}
freq[value]++
for distinct > k {
leftValue := nums[left]
freq[leftValue]--
if freq[leftValue] == 0 {
distinct--
}
left++
}
total += right - left + 1
}
return total
}
uniqueValues := make(map[int]bool)
for _, value := range nums {
uniqueValues[value] = true
}
left := 1
right := len(uniqueValues)
answer := right
for left <= right {
mid := (left + right) / 2
if countAtMost(mid) >= target {
answer = mid
right = mid - 1
} else {
left = mid + 1
}
}
return answer
}
The Go implementation follows the same logic as the Python version.
Go does not provide a built in defaultdict, so the frequency map uses normal map[int]int behavior where missing keys default to zero.
The closure countAtMost captures nums and n from the outer scope.
Integer overflow is not a concern here because the maximum number of subarrays is approximately 5 * 10^9, which fits safely inside Go's 64 bit integer representation on LeetCode environments. However, if strict portability were required, int64 could also be used.
Worked Examples
Example 1
Input:
nums = [1, 2, 3]
Total subarrays:
3 * 4 // 2 = 6
Median position:
(6 + 1) // 2 = 3
We binary search for the smallest k where at least 3 subarrays have at most k distinct values.
Checking k = 1
| right | Window | Distinct | Valid Subarrays Added | Total |
|---|---|---|---|---|
| 0 | [1] | 1 | 1 | 1 |
| 1 | [2] | 1 | 1 | 2 |
| 2 | [3] | 1 | 1 | 3 |
We get 3 valid subarrays.
Since 3 >= target, answer may be 1.
Final answer:
1
Example 2
Input:
nums = [3, 4, 3, 4, 5]
Total subarrays:
5 * 6 // 2 = 15
Median position:
(15 + 1) // 2 = 8
Checking k = 2
| right | Window | Distinct | Added | Total |
|---|---|---|---|---|
| 0 | [3] | 1 | 1 | 1 |
| 1 | [3,4] | 2 | 2 | 3 |
| 2 | [3,4,3] | 2 | 3 | 6 |
| 3 | [3,4,3,4] | 2 | 4 | 10 |
| 4 | [4,5] | 2 | 2 | 12 |
We get 12 valid subarrays.
Since 12 >= 8, the median is at most 2.
Checking k = 1 would produce only 5 subarrays, which is insufficient.
Therefore answer is:
2
Example 3
Input:
nums = [4, 3, 5, 4]
Total subarrays:
4 * 5 // 2 = 10
Median position:
(10 + 1) // 2 = 5
Checking k = 2
| right | Window | Distinct | Added | Total |
|---|---|---|---|---|
| 0 | [4] | 1 | 1 | 1 |
| 1 | [4,3] | 2 | 2 | 3 |
| 2 | [3,5] | 2 | 2 | 5 |
| 3 | [5,4] | 2 | 2 | 7 |
At least 5 subarrays satisfy the condition.
Therefore the answer is:
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Binary search over distinct counts, each check is O(n) |
| Space | O(n) | Frequency map may store up to n distinct values |
The sliding window itself is linear because each pointer moves at most n times. The binary search runs over the range of possible distinct counts, which is at most n. Therefore the total complexity becomes O(n log n).
The auxiliary space comes from the frequency hash map used inside the sliding window.
Test Cases
sol = Solution()
assert sol.medianOfUniquenessArray([1, 2, 3]) == 1
# Basic increasing array
assert sol.medianOfUniquenessArray([3, 4, 3, 4, 5]) == 2
# Repeated pattern example
assert sol.medianOfUniquenessArray([4, 3, 5, 4]) == 2
# Mixed duplicates and unique values
assert sol.medianOfUniquenessArray([1]) == 1
# Single element array
assert sol.medianOfUniquenessArray([7, 7, 7, 7]) == 1
# All elements identical
assert sol.medianOfUniquenessArray([1, 2, 3, 4, 5]) == 2
# All unique elements
assert sol.medianOfUniquenessArray([1, 1, 2, 2]) == 1
# Many low distinct subarrays
assert sol.medianOfUniquenessArray([1, 2, 1, 3, 2]) == 2
# Mixed repeating structure
assert sol.medianOfUniquenessArray([5, 4, 3, 2, 1]) == 2
# Strictly decreasing unique values
assert sol.medianOfUniquenessArray([1, 2, 1, 2, 1, 2]) == 2
# Alternating duplicates
| Test | Why |
|---|---|
[1,2,3] |
Small increasing example |
[3,4,3,4,5] |
Validates repeated values handling |
[4,3,5,4] |
Mixed structure validation |
[1] |
Minimum input size |
[7,7,7,7] |
All subarrays have one distinct value |
[1,2,3,4,5] |
Maximum diversity behavior |
[1,1,2,2] |
Heavy duplicate concentration |
[1,2,1,3,2] |
General mixed case |
[5,4,3,2,1] |
Unique decreasing array |
[1,2,1,2,1,2] |
Alternating pattern stress test |
Edge Cases
All Elements Identical
Consider:
nums = [5, 5, 5, 5]
Every subarray contains exactly one distinct element. A buggy implementation might overcount distinct values if frequencies are not updated correctly when shrinking the window.
The sliding window handles this naturally because the frequency map always contains only one active key. The binary search immediately converges to 1.
All Elements Unique
Consider:
nums = [1, 2, 3, 4, 5]
Distinct counts increase with subarray length. This case stresses whether the algorithm correctly counts many different distinct values.
The implementation correctly adjusts the window because every new element increases the distinct count, forcing the left pointer to move whenever the limit is exceeded.
Even Number of Total Subarrays
For arrays where the uniqueness array length is even, the problem asks for the smaller median.
For example:
nums = [1, 2, 3]
The uniqueness array is:
[1,1,1,2,2,3]
The two middle values are 1 and 2, and the answer must be 1.
The implementation handles this correctly by using:
target = (total + 1) // 2
This selects the lower median position automatically.