LeetCode 1157 - Online Majority Element In Subarray

The problem asks us to design a data structure that can efficiently answer majority element queries on subarrays of a given array.

LeetCode Problem 1157

Difficulty: 🔴 Hard
Topics: Array, Binary Search, Design, Binary Indexed Tree, Segment Tree

Solution

Problem Understanding

The problem asks us to design a data structure that can efficiently answer majority element queries on subarrays of a given array. Specifically, given an array arr and multiple queries of the form (left, right, threshold), the goal is to find an element within the subarray arr[left...right] that occurs at least threshold times. If no such element exists, the query should return -1.

The input consists of the initial array arr and multiple queries. Each query specifies a subarray and a threshold, and the output should be the element that meets the frequency requirement, or -1. The constraints indicate that the array length can be up to 20,000 and there can be up to 10,000 queries. Additionally, the problem guarantees that 2 * threshold > right - left + 1, meaning that if a majority element exists for that threshold, it is unique. This is a crucial property because it allows us to use randomized or probabilistic approaches efficiently without worrying about ties.

Important edge cases include very small arrays, queries that cover the entire array, thresholds that equal the subarray length, and subarrays where no element meets the threshold. A naive approach would struggle with these large constraints due to the high number of queries.

Approaches

Brute Force Approach

The brute force approach would iterate through each query, count the frequency of each element in the subarray using a hash map, and return the element that meets the threshold. This is correct because it examines every element in the subarray, ensuring we do not miss a majority element. However, the time complexity for each query is O(n) in the worst case, and with up to 10,000 queries, this results in O(n * q) total time, which is too slow for the input limits (n and q can both be ~20,000).

Optimal Approach

The key insight for an optimal solution comes from three observations. First, any element that occurs more than half of a subarray is a candidate for majority, and since 2 * threshold > right - left + 1, there can be at most one valid candidate. Second, we can precompute positions of each element in a dictionary for fast frequency counting. Third, by randomly sampling elements in the subarray, we can check a small number of candidates and verify their frequency using binary search. This allows each query to run in O(log n) time on average, after O(n) preprocessing. The approach combines randomization, frequency maps, and binary search for efficient verification.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * q) O(n) Iterate and count for each query
Randomized + Preprocessing O(q * log n) avg O(n) Precompute element positions, sample randomly, verify counts with binary search

Algorithm Walkthrough

  1. Preprocessing: Build a dictionary that maps each element in arr to a list of its indices. This allows us to quickly determine how many times an element occurs in any subarray using binary search.

  2. Query Handling: For each query (left, right, threshold):

  3. Randomly select an index from the subarray. Use the element at that index as a candidate.

  4. Using the precomputed list of positions for the candidate, find the number of occurrences within the subarray using binary search. Specifically, find the first index ≥ left and the last index ≤ right.

  5. If the count ≥ threshold, return the candidate.

  6. Repeat the random selection multiple times (e.g., 20-30 times) to increase the probability of finding the correct majority element.

  7. If no candidate passes the threshold after all trials, return -1.

Why it works: The guarantee that 2 * threshold > right - left + 1 ensures there is at most one element that can satisfy the condition. Randomly sampling enough elements in the subarray ensures we are highly likely to hit this element. The precomputed positions and binary search verify counts in logarithmic time, making the algorithm efficient.

Python Solution

from typing import List
import random
import bisect

class MajorityChecker:

    def __init__(self, arr: List[int]):
        self.arr = arr
        self.pos = {}
        for i, num in enumerate(arr):
            if num not in self.pos:
                self.pos[num] = []
            self.pos[num].append(i)

    def query(self, left: int, right: int, threshold: int) -> int:
        for _ in range(20):  # repeat enough times for high probability
            candidate = self.arr[random.randint(left, right)]
            indices = self.pos[candidate]
            l = bisect.bisect_left(indices, left)
            r = bisect.bisect_right(indices, right)
            if r - l >= threshold:
                return candidate
        return -1

This Python implementation first precomputes positions of all elements. For each query, it samples random elements in the subarray, counts occurrences using bisect binary search on the positions list, and returns the element if the count meets the threshold. The 20 iterations provide high probability to hit the majority element.

Go Solution

package main

import (
    "math/rand"
    "sort"
    "time"
)

type MajorityChecker struct {
    arr []int
    pos map[int][]int
}

func Constructor(arr []int) MajorityChecker {
    pos := make(map[int][]int)
    for i, num := range arr {
        pos[num] = append(pos[num], i)
    }
    rand.Seed(time.Now().UnixNano())
    return MajorityChecker{arr: arr, pos: pos}
}

func (this *MajorityChecker) Query(left int, right int, threshold int) int {
    for i := 0; i < 20; i++ {
        candidate := this.arr[rand.Intn(right-left+1)+left]
        indices := this.pos[candidate]
        l := sort.Search(len(indices), func(j int) bool { return indices[j] >= left })
        r := sort.Search(len(indices), func(j int) bool { return indices[j] > right })
        if r-l >= threshold {
            return candidate
        }
    }
    return -1
}

The Go implementation mirrors the Python logic. Random elements are sampled, and the number of occurrences is verified using sort.Search to emulate bisect. The rand.Seed ensures different random behavior across program runs.

Worked Examples

Example: arr = [1,1,2,2,1,1], query (0,5,4)

  1. Candidate sampling might pick index 0 → element 1.
  2. Positions of 1: [0,1,4,5]. Count in [0,5] → indices 0,1,4,5 → count = 4.
  3. Count >= threshold (4) → return 1.

Query (0,3,3)

  1. Random sampling picks candidates 1 or 2.
  2. For candidate 1, positions [0,1,4,5]. Count in [0,3] → indices 0,1 → count = 2 < 3.
  3. Candidate 2, positions [2,3] → count = 2 < 3.
  4. After 20 attempts, no valid candidate → return -1.

Query (2,3,2)

  1. Candidate sampling picks index 2 → element 2.
  2. Positions [2,3]. Count in [2,3] = 2 ≥ 2 → return 2.

Complexity Analysis

Measure Complexity Explanation
Time O(n + q * log n) avg Preprocessing O(n), each query uses binary search over positions O(log n), repeated 20 times
Space O(n) Store positions of each element in the array

The random sampling ensures the probability of missing the majority element is extremely low. Each binary search verification is logarithmic in the size of the positions list.

Test Cases

arr = [1,1,2,2,1,1]
obj = MajorityChecker(arr)

assert obj.query(0,5,4) == 1  # majority 1 in full array
assert obj.query(0,3,3) == -1 # no majority
assert obj.query(2,3,2) == 2  # majority 2 in subarray
assert obj.query(0,0,1) == 1  # single element
assert obj.query(5,5,1) == 1  # last element single
assert obj.query(0,5,6) == -1 # threshold exceeds occurrences
Test Why
Full array majority Validates correct majority selection
Subarray with no majority Ensures -1 is returned
Small subarray majority Verifies handling of short ranges
Single element subarray Edge case, threshold = 1
Threshold > occurrences Edge case, should return -1

Edge Cases

Edge Case 1: Single element subarray. A subarray of length 1 must return that element if threshold = 1. The implementation handles this naturally with random sampling.

Edge Case 2: Threshold equal to the subarray length. This requires counting the