LeetCode 2080 - Range Frequency Queries
The problem asks us to design a data structure that can efficiently answer frequency queries on subarrays. We are given an integer array arr, and we need to support repeated queries of the form: - Given left, right, and value - Return how many times value appears in the…
Difficulty: 🟡 Medium
Topics: Array, Hash Table, Binary Search, Design, Segment Tree
Solution
Problem Understanding
The problem asks us to design a data structure that can efficiently answer frequency queries on subarrays.
We are given an integer array arr, and we need to support repeated queries of the form:
- Given
left,right, andvalue - Return how many times
valueappears in the subarrayarr[left...right]
The important detail is that the data structure is built once, but queried many times. The constraints allow up to 10^5 elements in the array and up to 10^5 queries, which immediately tells us that a naive per-query scan is too slow.
For example, suppose:
arr = [12, 33, 4, 56, 22, 2, 34, 33, 22, 12, 34, 56]
If we query:
query(1, 2, 4)
we examine the subarray:
[33, 4]
The value 4 appears once, so the answer is 1.
The constraints reveal several important properties:
- The array is static, meaning it never changes after construction.
- We only need to answer read-only frequency queries.
- Query performance matters much more than update performance because there are no updates.
- Values are bounded by
10^4, but indices go up to10^5.
These observations strongly suggest preprocessing the array so queries become fast.
Several edge cases are important:
- The queried value may not exist anywhere in the array.
- The queried range may contain exactly one element.
- All elements may be identical.
- The queried value may occur many times outside the query range but not inside it.
- The query range may span the entire array.
A correct solution must handle all of these efficiently.
Approaches
Brute Force Approach
The most straightforward solution is to scan the subarray for every query.
For each query:
- Iterate from index
leftto indexright - Count how many elements equal
value - Return the count
This approach is correct because it directly computes the frequency by examining every relevant element.
However, it is far too slow for the given constraints.
If the array length is 10^5 and we also have 10^5 queries, then in the worst case each query scans nearly the entire array.
That leads to:
O(n * q) = O(10^10)
operations, which is not feasible.
Optimal Approach
The key observation is that we do not need to repeatedly scan the array.
Instead, we can preprocess the array and store, for each distinct value, the sorted list of indices where that value appears.
For example:
arr = [12, 33, 4, 56, 22, 2, 34, 33]
We can build:
12 -> [0]
33 -> [1, 7]
4 -> [2]
56 -> [3]
22 -> [4]
2 -> [5]
34 -> [6]
Now consider the query:
query(1, 7, 33)
We already know all positions of 33:
[1, 7]
We simply need to count how many indices fall within [1, 7].
Because the index lists are sorted, we can use binary search:
- Find the first index >=
left - Find the first index >
right - Their difference gives the count
This transforms each query from linear time into logarithmic time.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) per query | O(1) | Scans the entire query range every time |
| Optimal | O(log k) per query | O(n) | Stores sorted index lists and uses binary search |
Here, k is the number of occurrences of the queried value.
Algorithm Walkthrough
Step 1: Build a Mapping from Values to Indices
During initialization, iterate through the array.
For each element:
- Use the value as a key in a hash map
- Append the current index to that value's list
For example:
arr = [1, 3, 1, 2, 1]
Produces:
1 -> [0, 2, 4]
3 -> [1]
2 -> [3]
The lists are automatically sorted because we process the array from left to right.
Step 2: Process a Query
Suppose we receive:
query(left, right, value)
First, check whether value exists in the map.
If it does not exist, return 0.
Otherwise, retrieve the sorted index list for that value.
Example:
value = 1
indices = [0, 2, 4]
Step 3: Find the Left Boundary
Use binary search to locate the first position in indices where the index is greater than or equal to left.
This gives the beginning of the valid range.
Step 4: Find the Right Boundary
Use another binary search to locate the first position in indices where the index is greater than right.
This gives the position immediately after the valid range.
Step 5: Compute the Frequency
The number of valid occurrences is:
right_position - left_position
Because all indices between those two positions lie inside the query range.
Why it works
For every value, we store all occurrence indices in sorted order.
Binary search correctly identifies:
- The first occurrence inside the query range
- The first occurrence after the query range
Every index between those two boundaries must belong to the interval [left, right], and every valid occurrence is included exactly once.
Therefore, the difference between the two binary search positions equals the exact frequency.
Python Solution
from collections import defaultdict
from bisect import bisect_left, bisect_right
from typing import List
class RangeFreqQuery:
def __init__(self, arr: List[int]):
self.positions = defaultdict(list)
for index, value in enumerate(arr):
self.positions[value].append(index)
def query(self, left: int, right: int, value: int) -> int:
if value not in self.positions:
return 0
indices = self.positions[value]
left_pos = bisect_left(indices, left)
right_pos = bisect_right(indices, right)
return right_pos - left_pos
# Your RangeFreqQuery object will be instantiated and called as such:
# obj = RangeFreqQuery(arr)
# param_1 = obj.query(left,right,value)
The implementation begins by constructing a hash map named positions.
Each key is a value from the array, and its corresponding value is a list of all indices where that number appears.
Because we iterate through the array in increasing index order, every list remains sorted automatically.
During queries, we first check whether the requested value exists in the map. If it does not, the frequency is immediately 0.
Otherwise, we retrieve the sorted index list and use:
bisect_leftto find the first index greater than or equal toleftbisect_rightto find the first index greater thanright
The difference between these positions gives the number of occurrences inside the interval.
This works because binary search efficiently counts how many sorted indices fall within a range.
Go Solution
package main
import "sort"
type RangeFreqQuery struct {
positions map[int][]int
}
func Constructor(arr []int) RangeFreqQuery {
rfq := RangeFreqQuery{
positions: make(map[int][]int),
}
for index, value := range arr {
rfq.positions[value] = append(rfq.positions[value], index)
}
return rfq
}
func (this *RangeFreqQuery) Query(left int, right int, value int) int {
indices, exists := this.positions[value]
if !exists {
return 0
}
leftPos := sort.SearchInts(indices, left)
rightPos := sort.Search(len(indices), func(i int) bool {
return indices[i] > right
})
return rightPos - leftPos
}
/**
* Your RangeFreqQuery object will be instantiated and called as such:
* obj := Constructor(arr);
* param_1 := obj.Query(left,right,value);
*/
The Go solution follows the same algorithmic structure as the Python version.
Instead of Python's bisect module, Go uses:
sort.SearchIntsfor the lower boundsort.Searchwith a custom condition for the upper bound
Go maps naturally support slices as values, making it easy to store occurrence indices.
Because the array indices are appended in order, the slices remain sorted automatically, which allows binary search to work correctly.
Worked Examples
Example 1
arr = [12, 33, 4, 56, 22, 2, 34, 33, 22, 12, 34, 56]
Preprocessing Phase
| Index | Value | Map State |
|---|---|---|
| 0 | 12 | 12 -> [0] |
| 1 | 33 | 33 -> [1] |
| 2 | 4 | 4 -> [2] |
| 3 | 56 | 56 -> [3] |
| 4 | 22 | 22 -> [4] |
| 5 | 2 | 2 -> [5] |
| 6 | 34 | 34 -> [6] |
| 7 | 33 | 33 -> [1, 7] |
| 8 | 22 | 22 -> [4, 8] |
| 9 | 12 | 12 -> [0, 9] |
| 10 | 34 | 34 -> [6, 10] |
| 11 | 56 | 56 -> [3, 11] |
Final map:
12 -> [0, 9]
33 -> [1, 7]
4 -> [2]
56 -> [3, 11]
22 -> [4, 8]
2 -> [5]
34 -> [6, 10]
Query 1
query(1, 2, 4)
Retrieve:
indices = [2]
Binary searches:
| Operation | Result |
|---|---|
| first index >= 1 | 0 |
| first index > 2 | 1 |
Frequency:
1 - 0 = 1
Answer:
1
Query 2
query(0, 11, 33)
Retrieve:
indices = [1, 7]
Binary searches:
| Operation | Result |
|---|---|
| first index >= 0 | 0 |
| first index > 11 | 2 |
Frequency:
2 - 0 = 2
Answer:
2
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(log k) per query | Binary searches on occurrence list |
| Space | O(n) | Every array index is stored once |
The preprocessing phase takes O(n) time because we scan the array once.
Each query performs two binary searches on the occurrence list of the queried value.
If the value appears k times, binary search takes O(log k) time.
The total space complexity is O(n) because every array index is stored exactly once across all lists.
Test Cases
from collections import defaultdict
from bisect import bisect_left, bisect_right
class RangeFreqQuery:
def __init__(self, arr):
self.positions = defaultdict(list)
for index, value in enumerate(arr):
self.positions[value].append(index)
def query(self, left, right, value):
if value not in self.positions:
return 0
indices = self.positions[value]
return (
bisect_right(indices, right)
- bisect_left(indices, left)
)
# Provided example
rfq = RangeFreqQuery([12, 33, 4, 56, 22, 2, 34, 33, 22, 12, 34, 56])
assert rfq.query(1, 2, 4) == 1 # single occurrence in small range
assert rfq.query(0, 11, 33) == 2 # multiple occurrences in full range
# Single element array
rfq = RangeFreqQuery([5])
assert rfq.query(0, 0, 5) == 1 # exact match
assert rfq.query(0, 0, 1) == 0 # missing value
# All elements identical
rfq = RangeFreqQuery([7, 7, 7, 7, 7])
assert rfq.query(0, 4, 7) == 5 # entire array
assert rfq.query(1, 3, 7) == 3 # middle subarray
# Value exists outside range but not inside
rfq = RangeFreqQuery([1, 2, 3, 1, 2, 3])
assert rfq.query(1, 2, 1) == 0 # occurrences outside interval only
# Query at boundaries
rfq = RangeFreqQuery([4, 5, 4, 5, 4])
assert rfq.query(0, 0, 4) == 1 # left boundary
assert rfq.query(4, 4, 4) == 1 # right boundary
# Large repeated blocks
rfq = RangeFreqQuery([1] * 1000 + [2] * 1000)
assert rfq.query(0, 999, 1) == 1000 # first block
assert rfq.query(1000, 1999, 2) == 1000 # second block
assert rfq.query(500, 1500, 2) == 501 # partial overlap
# Missing value entirely
rfq = RangeFreqQuery([1, 2, 3])
assert rfq.query(0, 2, 10) == 0 # value absent from array
| Test | Why |
|---|---|
| Provided example | Validates standard functionality |
| Single element array | Tests smallest valid input |
| All elements identical | Tests large frequency counts |
| Value outside range | Ensures range filtering works correctly |
| Boundary queries | Verifies inclusive index handling |
| Large repeated blocks | Tests performance-oriented scenarios |
| Missing value entirely | Confirms correct handling of absent keys |
Edge Cases
Querying a Value That Does Not Exist
A common source of bugs is attempting to binary search an index list for a value that never appears in the array.
For example:
arr = [1, 2, 3]
query(0, 2, 10)
If the implementation blindly accesses the map, it may cause errors or incorrect behavior.
The solution handles this safely by checking:
if value not in self.positions:
return 0
This guarantees correctness for absent values.
Single Element Ranges
Ranges where left == right can expose off-by-one mistakes.
Example:
arr = [5]
query(0, 0, 5)
The correct answer is 1.
Using bisect_left and bisect_right carefully preserves inclusive boundaries, ensuring that a single valid index is counted correctly.
Values Appearing Outside but Not Inside the Range
Another subtle case occurs when the value exists in the array but not inside the requested interval.
Example:
arr = [1, 2, 3, 1]
query(1, 2, 1)
The value 1 exists globally but not inside [1, 2].
The binary searches correctly isolate only indices within the requested range, producing 0 rather than counting all occurrences globally.