LeetCode 1157 - Online Majority Element In Subarray
The problem asks us to design a data structure that can efficiently answer majority element queries on subarrays of a given array.
Difficulty: 🔴 Hard
Topics: Array, Binary Search, Design, Binary Indexed Tree, Segment Tree
Solution
Problem Understanding
The problem asks us to design a data structure that can efficiently answer majority element queries on subarrays of a given array. Specifically, given an array arr and multiple queries of the form (left, right, threshold), the goal is to find an element within the subarray arr[left...right] that occurs at least threshold times. If no such element exists, the query should return -1.
The input consists of the initial array arr and multiple queries. Each query specifies a subarray and a threshold, and the output should be the element that meets the frequency requirement, or -1. The constraints indicate that the array length can be up to 20,000 and there can be up to 10,000 queries. Additionally, the problem guarantees that 2 * threshold > right - left + 1, meaning that if a majority element exists for that threshold, it is unique. This is a crucial property because it allows us to use randomized or probabilistic approaches efficiently without worrying about ties.
Important edge cases include very small arrays, queries that cover the entire array, thresholds that equal the subarray length, and subarrays where no element meets the threshold. A naive approach would struggle with these large constraints due to the high number of queries.
Approaches
Brute Force Approach
The brute force approach would iterate through each query, count the frequency of each element in the subarray using a hash map, and return the element that meets the threshold. This is correct because it examines every element in the subarray, ensuring we do not miss a majority element. However, the time complexity for each query is O(n) in the worst case, and with up to 10,000 queries, this results in O(n * q) total time, which is too slow for the input limits (n and q can both be ~20,000).
Optimal Approach
The key insight for an optimal solution comes from three observations. First, any element that occurs more than half of a subarray is a candidate for majority, and since 2 * threshold > right - left + 1, there can be at most one valid candidate. Second, we can precompute positions of each element in a dictionary for fast frequency counting. Third, by randomly sampling elements in the subarray, we can check a small number of candidates and verify their frequency using binary search. This allows each query to run in O(log n) time on average, after O(n) preprocessing. The approach combines randomization, frequency maps, and binary search for efficient verification.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * q) | O(n) | Iterate and count for each query |
| Randomized + Preprocessing | O(q * log n) avg | O(n) | Precompute element positions, sample randomly, verify counts with binary search |
Algorithm Walkthrough
-
Preprocessing: Build a dictionary that maps each element in
arrto a list of its indices. This allows us to quickly determine how many times an element occurs in any subarray using binary search. -
Query Handling: For each query
(left, right, threshold): -
Randomly select an index from the subarray. Use the element at that index as a candidate.
-
Using the precomputed list of positions for the candidate, find the number of occurrences within the subarray using binary search. Specifically, find the first index ≥
leftand the last index ≤right. -
If the count ≥
threshold, return the candidate. -
Repeat the random selection multiple times (e.g., 20-30 times) to increase the probability of finding the correct majority element.
-
If no candidate passes the threshold after all trials, return
-1.
Why it works: The guarantee that 2 * threshold > right - left + 1 ensures there is at most one element that can satisfy the condition. Randomly sampling enough elements in the subarray ensures we are highly likely to hit this element. The precomputed positions and binary search verify counts in logarithmic time, making the algorithm efficient.
Python Solution
from typing import List
import random
import bisect
class MajorityChecker:
def __init__(self, arr: List[int]):
self.arr = arr
self.pos = {}
for i, num in enumerate(arr):
if num not in self.pos:
self.pos[num] = []
self.pos[num].append(i)
def query(self, left: int, right: int, threshold: int) -> int:
for _ in range(20): # repeat enough times for high probability
candidate = self.arr[random.randint(left, right)]
indices = self.pos[candidate]
l = bisect.bisect_left(indices, left)
r = bisect.bisect_right(indices, right)
if r - l >= threshold:
return candidate
return -1
This Python implementation first precomputes positions of all elements. For each query, it samples random elements in the subarray, counts occurrences using bisect binary search on the positions list, and returns the element if the count meets the threshold. The 20 iterations provide high probability to hit the majority element.
Go Solution
package main
import (
"math/rand"
"sort"
"time"
)
type MajorityChecker struct {
arr []int
pos map[int][]int
}
func Constructor(arr []int) MajorityChecker {
pos := make(map[int][]int)
for i, num := range arr {
pos[num] = append(pos[num], i)
}
rand.Seed(time.Now().UnixNano())
return MajorityChecker{arr: arr, pos: pos}
}
func (this *MajorityChecker) Query(left int, right int, threshold int) int {
for i := 0; i < 20; i++ {
candidate := this.arr[rand.Intn(right-left+1)+left]
indices := this.pos[candidate]
l := sort.Search(len(indices), func(j int) bool { return indices[j] >= left })
r := sort.Search(len(indices), func(j int) bool { return indices[j] > right })
if r-l >= threshold {
return candidate
}
}
return -1
}
The Go implementation mirrors the Python logic. Random elements are sampled, and the number of occurrences is verified using sort.Search to emulate bisect. The rand.Seed ensures different random behavior across program runs.
Worked Examples
Example: arr = [1,1,2,2,1,1], query (0,5,4)
- Candidate sampling might pick index 0 → element 1.
- Positions of 1:
[0,1,4,5]. Count in[0,5]→ indices 0,1,4,5 → count = 4. - Count >= threshold (4) → return 1.
Query (0,3,3)
- Random sampling picks candidates 1 or 2.
- For candidate 1, positions
[0,1,4,5]. Count in[0,3]→ indices 0,1 → count = 2 < 3. - Candidate 2, positions
[2,3]→ count = 2 < 3. - After 20 attempts, no valid candidate → return -1.
Query (2,3,2)
- Candidate sampling picks index 2 → element 2.
- Positions
[2,3]. Count in[2,3]= 2 ≥ 2 → return 2.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + q * log n) avg | Preprocessing O(n), each query uses binary search over positions O(log n), repeated 20 times |
| Space | O(n) | Store positions of each element in the array |
The random sampling ensures the probability of missing the majority element is extremely low. Each binary search verification is logarithmic in the size of the positions list.
Test Cases
arr = [1,1,2,2,1,1]
obj = MajorityChecker(arr)
assert obj.query(0,5,4) == 1 # majority 1 in full array
assert obj.query(0,3,3) == -1 # no majority
assert obj.query(2,3,2) == 2 # majority 2 in subarray
assert obj.query(0,0,1) == 1 # single element
assert obj.query(5,5,1) == 1 # last element single
assert obj.query(0,5,6) == -1 # threshold exceeds occurrences
| Test | Why |
|---|---|
| Full array majority | Validates correct majority selection |
| Subarray with no majority | Ensures -1 is returned |
| Small subarray majority | Verifies handling of short ranges |
| Single element subarray | Edge case, threshold = 1 |
| Threshold > occurrences | Edge case, should return -1 |
Edge Cases
Edge Case 1: Single element subarray. A subarray of length 1 must return that element if threshold = 1. The implementation handles this naturally with random sampling.
Edge Case 2: Threshold equal to the subarray length. This requires counting the