LeetCode 3636 - Threshold Majority Queries

The problem asks us to process multiple queries on a given array nums where each query specifies a subarray defined by [li, ri] and a threshold. For each query, we need to find an element in the subarray that appears at least threshold times.

LeetCode Problem 3636

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Binary Search, Divide and Conquer, Counting, Prefix Sum

Solution

Problem Understanding

The problem asks us to process multiple queries on a given array nums where each query specifies a subarray defined by [li, ri] and a threshold. For each query, we need to find an element in the subarray that appears at least threshold times. If multiple elements meet this criterion, we select the one with the highest frequency, and if there is a tie in frequency, we choose the smallest value. If no element meets the threshold, we return -1 for that query.

The input array can have up to 10,000 elements, and there can be up to 50,000 queries. This scale implies that a naive solution that counts elements for each query separately would be too slow. Each query could be as large as the entire array, and the element values can be large (up to 10^9), so direct frequency counting without careful design may be inefficient. The constraints guarantee valid query ranges and threshold values, but the wide range of element values and the number of queries are the main challenge.

Important edge cases include subarrays where no element meets the threshold, subarrays where multiple elements meet the threshold with equal frequencies, and queries on single-element subarrays.

Approaches

A brute-force approach would iterate over each query, slice the subarray, count frequencies using a hash map, and then select the element that meets the threshold with the highest frequency. This approach is straightforward and correct but inefficient. Specifically, slicing and counting for each query results in O(Q * N) time complexity in the worst case (where Q is the number of queries and N is the array length). Given the constraints (N ≤ 10^4, Q ≤ 5*10^4), this approach is too slow.

The optimal solution uses a combination of position mapping and binary search for efficient frequency counting within any subarray. By preprocessing the positions of each element in nums, we can quickly determine how many times an element appears within any query range using two binary searches. This avoids iterating through the subarray repeatedly. Additionally, since only elements that appear frequently in the entire array can satisfy high thresholds, we can optimize by checking only candidates that occur sufficiently often globally.

Approach Time Complexity Space Complexity Notes
Brute Force O(Q * N) O(N) Count each subarray separately
Optimal O(Q * log N * K) O(N + K) Preprocess positions, binary search for counts in subarrays, K is number of candidate elements per query

Algorithm Walkthrough

  1. Preprocess the array by building a dictionary mapping each unique element to a sorted list of its indices in nums. This allows fast lookup of an element's occurrences within any subarray.
  2. For each query [li, ri, threshold], iterate over the unique elements in nums (or maintain a candidate list of elements that appear often enough to meet the threshold).
  3. For each candidate element, use binary search on its list of indices to count how many times it appears between indices li and ri. The count is determined by finding the first index not less than li and the first index greater than ri, then subtracting the positions.
  4. Keep track of the element that meets the threshold and has the highest frequency. In case of a tie in frequency, select the smaller element.
  5. If no element satisfies the threshold, append -1 to the results. Otherwise, append the chosen element.

Why it works: Preprocessing positions guarantees O(log N) access to frequency counts in any subarray. By considering only elements that have enough occurrences in the subarray, we ensure correctness and efficiency. Using binary search avoids scanning the subarray explicitly.

Python Solution

from typing import List
from collections import defaultdict
import bisect

class Solution:
    def subarrayMajority(self, nums: List[int], queries: List[List[int]]) -> List[int]:
        positions = defaultdict(list)
        for idx, num in enumerate(nums):
            positions[num].append(idx)
        
        result = []
        for li, ri, threshold in queries:
            candidate = -1
            max_freq = 0
            for num, pos_list in positions.items():
                left = bisect.bisect_left(pos_list, li)
                right = bisect.bisect_right(pos_list, ri)
                freq = right - left
                if freq >= threshold:
                    if freq > max_freq or (freq == max_freq and num < candidate):
                        max_freq = freq
                        candidate = num
            result.append(candidate)
        return result

The implementation first constructs a dictionary of positions for each element. For each query, we calculate the frequency of each element in the subarray using binary search. We track the element with the highest frequency that meets the threshold and handle ties by choosing the smaller element.

Go Solution

package main

import (
	"sort"
)

func subarrayMajority(nums []int, queries [][]int) []int {
	positions := make(map[int][]int)
	for i, num := range nums {
		positions[num] = append(positions[num], i)
	}
	
	result := make([]int, 0, len(queries))
	
	for _, q := range queries {
		li, ri, threshold := q[0], q[1], q[2]
		candidate := -1
		maxFreq := 0
		for num, posList := range positions {
			left := sort.Search(len(posList), func(i int) bool { return posList[i] >= li })
			right := sort.Search(len(posList), func(i int) bool { return posList[i] > ri })
			freq := right - left
			if freq >= threshold {
				if freq > maxFreq || (freq == maxFreq && num < candidate) {
					maxFreq = freq
					candidate = num
				}
			}
		}
		result = append(result, candidate)
	}
	return result
}

The Go version mirrors the Python implementation, using sort.Search for binary searches and a map to store positions. Care is taken to append results efficiently.

Worked Examples

Example 1

nums = [1,1,2,2,1,1], queries = [[0,5,4],[0,3,3],[2,3,2]]

Query Candidate Frequencies Result
[0,5,4] 1 → 4, 2 → 2 1
[0,3,3] 1 → 2, 2 → 2 -1
[2,3,2] 2 → 2 2

Example 2

nums = [3,2,3,2,3,2,3], queries = [[0,6,4],[1,5,2],[2,4,1],[3,3,1]]

Query Candidate Frequencies Result
[0,6,4] 3 → 4, 2 → 3 3
[1,5,2] 2 → 3, 3 → 2 2
[2,4,1] 3 → 2, 2 → 1 3
[3,3,1] 2 → 1 2

Complexity Analysis

Measure Complexity Explanation
Time O(Q * K * log N) Q queries, K candidate elements, log N for binary search per element
Space O(N + K) Store positions of elements and output array

The complexity is reasonable given constraints. The number of candidate elements K is often much smaller than N, especially when thresholds are high.

Test Cases

# Basic provided examples
assert Solution().subarrayMajority([1,1,2,2,1,1], [[0,5,4],[0,3,3],[2,3,2]]) == [1,-1,2]
assert Solution().subarrayMajority([3,2,3,2,3,2,3], [[0,6,4],[1,5,2],[2,4,1],[3,3,1]]) == [3,2,3,2]

# Edge case: single element array
assert Solution().subarrayMajority([1], [[0,0,1]]) == [1]

# Edge case: no element meets threshold
assert Solution().subarrayMajority([1,2,3], [[0,2,4]]) == [-1]

# Edge case: tie in frequency, smallest element selected
assert Solution().subarrayMajority([1,2,2,1], [[0,3,2]]) == [1]

# Large values
assert Solution().subarrayMajority([10**9, 10**9, 1], [[0,2,2]]) == [1000000000]
Test Why
Provided examples Validate main functionality
Single element Handles minimal array size
No element meets threshold Validates correct -1 output
Tie in frequency Ensures smallest element selected
Large values Confirms correctness with max integer values

Edge Cases

A key edge case is a query on a single-element subarray. Here, the threshold can only be satisfied by that element itself. Another edge case is when no element meets the threshold; the implementation correctly returns -1. A third