LeetCode 3134 - Find the Median of the Uniqueness Array

The problem defines a special array called the uniqueness array. For every possible subarray of nums, we compute how many distinct values appear inside that subarray. We then collect all of those distinct counts into a single array and sort it in non decreasing order.

LeetCode Problem 3134

Difficulty: šŸ”“ Hard
Topics: Array, Hash Table, Binary Search, Sliding Window

Solution

Problem Understanding

The problem defines a special array called the uniqueness array. For every possible subarray of nums, we compute how many distinct values appear inside that subarray. We then collect all of those distinct counts into a single array and sort it in non decreasing order. The task is to return the median value from this sorted array.

Suppose nums = [1,2,3]. The subarrays are:

  • [1] → 1 distinct value
  • [2] → 1 distinct value
  • [3] → 1 distinct value
  • [1,2] → 2 distinct values
  • [2,3] → 2 distinct values
  • [1,2,3] → 3 distinct values

So the uniqueness array becomes [1,1,1,2,2,3]. Since there are 6 elements, the problem asks for the smaller middle element. The two middle elements are at indices 2 and 3, which are 1 and 2, so the answer is 1.

The input size is very important here. The array length can be as large as 10^5. The number of subarrays of an array of length n is:

$\frac{n(n+1)}{2}$

For n = 10^5, this is roughly 5 * 10^9 subarrays. This immediately tells us that explicitly generating all subarrays or all distinct counts is impossible.

The problem therefore requires a more mathematical and indirect approach. Instead of constructing the uniqueness array, we need to determine its median using counting techniques.

Several edge cases are important:

  • Arrays where all elements are identical. Every subarray has exactly one distinct element.
  • Arrays where all elements are unique. Distinct counts grow with subarray length.
  • Very large arrays, where even quadratic algorithms are too slow.
  • Even sized uniqueness arrays, where the smaller middle value must be chosen.

The constraints strongly suggest an algorithm around O(n log n).

Approaches

Brute Force Approach

The most direct solution is to enumerate every subarray and compute the number of distinct elements in it.

For each starting index i, we expand the ending index j and maintain a hash set of seen values. Each subarray contributes one distinct count to the uniqueness array. After generating all counts, we sort them and return the median.

This approach is correct because it exactly follows the problem definition.

However, the number of subarrays is:

$\frac{n(n+1)}{2}$

That is already too large for n = 10^5. Additionally, sorting billions of values is impossible within time limits.

Optimal Approach

The key observation is that we do not actually need the entire uniqueness array. We only need its median.

This transforms the problem into a selection problem. Instead of generating all distinct counts, we binary search on the answer.

Suppose we ask:

How many subarrays have at most k distinct elements?

If we can compute this efficiently, then we can determine whether the median is less than or equal to k.

This works because the uniqueness array is sorted conceptually. If at least half of the subarrays have distinct count <= k, then the median must also be <= k.

The remaining challenge becomes efficiently counting subarrays with at most k distinct elements. This is a classic sliding window problem.

Using a two pointer sliding window and a frequency map:

  • Expand the right pointer.
  • Shrink the left pointer whenever the number of distinct elements exceeds k.
  • For every right endpoint, count how many valid subarrays end there.

This gives an O(n) counting procedure.

We then binary search over possible distinct counts from 1 to n.

Approach Time Complexity Space Complexity Notes
Brute Force O(n² log n) or worse O(n²) Explicitly generates all distinct counts
Optimal O(n log n) O(n) Binary search plus sliding window counting

Algorithm Walkthrough

  1. Compute the total number of subarrays.

The number of subarrays in an array of length n is:

$\frac{n(n+1)}{2}$

Let this value be total. 2. Determine the median position.

Since the problem asks for the smaller middle value when there are two medians, we use:

target = (total + 1) // 2

This gives the 1 indexed position of the median in the sorted uniqueness array. 3. Binary search on the answer.

The minimum possible distinct count is 1.

The maximum possible distinct count is the number of unique values in the entire array, at most n.

We binary search for the smallest k such that:

count_at_most(k) >= target
  1. Implement count_at_most(k) using sliding window.

We maintain:

  • left pointer
  • right pointer
  • frequency hash map
  • current number of distinct elements
  1. Expand the window.

For each right index:

  • Add nums[right] to the frequency map.
  • If this value appears for the first time, increase distinct count.
  1. Shrink the window if needed.

While the number of distinct elements exceeds k:

  • Remove nums[left]
  • Decrease its frequency
  • If frequency becomes zero, decrease distinct count
  • Move left forward
  1. Count valid subarrays.

Once the window satisfies the condition, every subarray ending at right and starting from any index in [left, right] is valid.

The number of such subarrays is:

right - left + 1

Add this to the running total. 8. Use binary search to find the smallest valid k.

If count_at_most(mid) is at least the median position, move left in binary search.

Otherwise move right. 9. Return the final binary search result.

Why it works

The sliding window correctly counts all subarrays with at most k distinct elements because the window always maintains the maximal valid range ending at each right index.

Binary search works because the predicate is monotonic:

  • If at least target subarrays have distinct count <= k,
  • then the same is true for every larger value.

Therefore, the smallest valid k is exactly the median of the uniqueness array.

Python Solution

from collections import defaultdict
from typing import List

class Solution:
    def medianOfUniquenessArray(self, nums: List[int]) -> int:
        n = len(nums)
        total_subarrays = n * (n + 1) // 2
        target = (total_subarrays + 1) // 2

        def count_at_most(k: int) -> int:
            freq = defaultdict(int)
            left = 0
            distinct = 0
            total = 0

            for right in range(n):
                value = nums[right]

                if freq[value] == 0:
                    distinct += 1

                freq[value] += 1

                while distinct > k:
                    left_value = nums[left]
                    freq[left_value] -= 1

                    if freq[left_value] == 0:
                        distinct -= 1

                    left += 1

                total += right - left + 1

            return total

        left = 1
        right = len(set(nums))
        answer = right

        while left <= right:
            mid = (left + right) // 2

            if count_at_most(mid) >= target:
                answer = mid
                right = mid - 1
            else:
                left = mid + 1

        return answer

The implementation starts by computing the total number of subarrays and the target median position.

The helper function count_at_most(k) is the core of the algorithm. It uses a sliding window to count how many subarrays contain at most k distinct values.

The freq hash map stores how many times each value appears inside the current window. The variable distinct tracks the number of unique values currently inside the window.

As the right pointer expands, the window may become invalid by containing too many distinct values. The inner while loop shrinks the window from the left until the condition is restored.

For every valid window ending at right, all starting positions between left and right produce valid subarrays. This contributes right - left + 1 new subarrays.

The binary search repeatedly checks whether enough subarrays satisfy the distinct limit. The first valid k is the median.

Go Solution

func medianOfUniquenessArray(nums []int) int {
	n := len(nums)

	totalSubarrays := n * (n + 1) / 2
	target := (totalSubarrays + 1) / 2

	countAtMost := func(k int) int {
		freq := make(map[int]int)

		left := 0
		distinct := 0
		total := 0

		for right := 0; right < n; right++ {
			value := nums[right]

			if freq[value] == 0 {
				distinct++
			}

			freq[value]++

			for distinct > k {
				leftValue := nums[left]

				freq[leftValue]--

				if freq[leftValue] == 0 {
					distinct--
				}

				left++
			}

			total += right - left + 1
		}

		return total
	}

	uniqueValues := make(map[int]bool)

	for _, value := range nums {
		uniqueValues[value] = true
	}

	left := 1
	right := len(uniqueValues)
	answer := right

	for left <= right {
		mid := (left + right) / 2

		if countAtMost(mid) >= target {
			answer = mid
			right = mid - 1
		} else {
			left = mid + 1
		}
	}

	return answer
}

The Go implementation follows the same logic as the Python version.

Go does not provide a built in defaultdict, so the frequency map uses normal map[int]int behavior where missing keys default to zero.

The closure countAtMost captures nums and n from the outer scope.

Integer overflow is not a concern here because the maximum number of subarrays is approximately 5 * 10^9, which fits safely inside Go's 64 bit integer representation on LeetCode environments. However, if strict portability were required, int64 could also be used.

Worked Examples

Example 1

Input:

nums = [1, 2, 3]

Total subarrays:

3 * 4 // 2 = 6

Median position:

(6 + 1) // 2 = 3

We binary search for the smallest k where at least 3 subarrays have at most k distinct values.

Checking k = 1

right Window Distinct Valid Subarrays Added Total
0 [1] 1 1 1
1 [2] 1 1 2
2 [3] 1 1 3

We get 3 valid subarrays.

Since 3 >= target, answer may be 1.

Final answer:

1

Example 2

Input:

nums = [3, 4, 3, 4, 5]

Total subarrays:

5 * 6 // 2 = 15

Median position:

(15 + 1) // 2 = 8

Checking k = 2

right Window Distinct Added Total
0 [3] 1 1 1
1 [3,4] 2 2 3
2 [3,4,3] 2 3 6
3 [3,4,3,4] 2 4 10
4 [4,5] 2 2 12

We get 12 valid subarrays.

Since 12 >= 8, the median is at most 2.

Checking k = 1 would produce only 5 subarrays, which is insufficient.

Therefore answer is:

2

Example 3

Input:

nums = [4, 3, 5, 4]

Total subarrays:

4 * 5 // 2 = 10

Median position:

(10 + 1) // 2 = 5

Checking k = 2

right Window Distinct Added Total
0 [4] 1 1 1
1 [4,3] 2 2 3
2 [3,5] 2 2 5
3 [5,4] 2 2 7

At least 5 subarrays satisfy the condition.

Therefore the answer is:

2

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Binary search over distinct counts, each check is O(n)
Space O(n) Frequency map may store up to n distinct values

The sliding window itself is linear because each pointer moves at most n times. The binary search runs over the range of possible distinct counts, which is at most n. Therefore the total complexity becomes O(n log n).

The auxiliary space comes from the frequency hash map used inside the sliding window.

Test Cases

sol = Solution()

assert sol.medianOfUniquenessArray([1, 2, 3]) == 1
# Basic increasing array

assert sol.medianOfUniquenessArray([3, 4, 3, 4, 5]) == 2
# Repeated pattern example

assert sol.medianOfUniquenessArray([4, 3, 5, 4]) == 2
# Mixed duplicates and unique values

assert sol.medianOfUniquenessArray([1]) == 1
# Single element array

assert sol.medianOfUniquenessArray([7, 7, 7, 7]) == 1
# All elements identical

assert sol.medianOfUniquenessArray([1, 2, 3, 4, 5]) == 2
# All unique elements

assert sol.medianOfUniquenessArray([1, 1, 2, 2]) == 1
# Many low distinct subarrays

assert sol.medianOfUniquenessArray([1, 2, 1, 3, 2]) == 2
# Mixed repeating structure

assert sol.medianOfUniquenessArray([5, 4, 3, 2, 1]) == 2
# Strictly decreasing unique values

assert sol.medianOfUniquenessArray([1, 2, 1, 2, 1, 2]) == 2
# Alternating duplicates
Test Why
[1,2,3] Small increasing example
[3,4,3,4,5] Validates repeated values handling
[4,3,5,4] Mixed structure validation
[1] Minimum input size
[7,7,7,7] All subarrays have one distinct value
[1,2,3,4,5] Maximum diversity behavior
[1,1,2,2] Heavy duplicate concentration
[1,2,1,3,2] General mixed case
[5,4,3,2,1] Unique decreasing array
[1,2,1,2,1,2] Alternating pattern stress test

Edge Cases

All Elements Identical

Consider:

nums = [5, 5, 5, 5]

Every subarray contains exactly one distinct element. A buggy implementation might overcount distinct values if frequencies are not updated correctly when shrinking the window.

The sliding window handles this naturally because the frequency map always contains only one active key. The binary search immediately converges to 1.

All Elements Unique

Consider:

nums = [1, 2, 3, 4, 5]

Distinct counts increase with subarray length. This case stresses whether the algorithm correctly counts many different distinct values.

The implementation correctly adjusts the window because every new element increases the distinct count, forcing the left pointer to move whenever the limit is exceeded.

Even Number of Total Subarrays

For arrays where the uniqueness array length is even, the problem asks for the smaller median.

For example:

nums = [1, 2, 3]

The uniqueness array is:

[1,1,1,2,2,3]

The two middle values are 1 and 2, and the answer must be 1.

The implementation handles this correctly by using:

target = (total + 1) // 2

This selects the lower median position automatically.