LeetCode 398 - Random Pick Index
The problem gives us an integer array nums that may contain duplicate values. We must design a class that supports repeatedly selecting a random index for a given target value.
Difficulty: 🟡 Medium
Topics: Hash Table, Math, Reservoir Sampling, Randomized
Solution
Problem Understanding
The problem gives us an integer array nums that may contain duplicate values. We must design a class that supports repeatedly selecting a random index for a given target value.
The important requirement is that if the target appears multiple times, every valid index must have the same probability of being returned.
For example, if:
nums = [1, 2, 3, 3, 3]
and we call:
pick(3)
then the valid indices are:
2, 3, 4
Each of these indices must be returned with probability 1/3.
The constructor receives the full array once, and then the pick method may be called many times afterward. The problem guarantees that the target always exists in the array, so we never need to handle a missing target.
The constraints are important:
nums.lengthcan be up to2 * 10^4pickcan be called up to10^4times
These limits are not extremely large, but they are large enough that repeatedly scanning the entire array for every query could become inefficient.
Another important detail is that the problem specifically lists "Reservoir Sampling" as one of the topics. That strongly suggests the intended optimal solution should use reservoir sampling, especially because this technique allows uniform random selection without storing all matching indices.
Some edge cases that can cause bugs are:
- The target appears exactly once
- Every element in the array is identical
- The target appears many times
- Negative numbers are present
- Random selection must remain uniformly distributed
The biggest logical challenge is ensuring equal probability among all matching indices.
Approaches
Brute Force Approach
The most direct solution is to preprocess the array and store all indices for every value.
We can use a hash map where:
value -> list of indices
For example:
nums = [1, 2, 3, 3, 3]
would become:
{
1: [0],
2: [1],
3: [2, 3, 4]
}
Then when pick(target) is called:
- Retrieve the list of indices for that target
- Randomly choose one index from the list
- Return it
This approach is correct because every valid index is explicitly stored, making uniform random selection straightforward.
However, it uses additional memory proportional to the size of the array. While acceptable for these constraints, the problem is specifically designed to encourage a more elegant reservoir sampling solution.
Optimal Approach, Reservoir Sampling
The key insight is that we do not actually need to store all matching indices.
Instead, while scanning the array, we can maintain a current candidate answer and update it with carefully chosen probabilities.
Suppose we encounter matching indices one by one.
- The first matching index should be selected with probability
1 - The second matching index should replace the first with probability
1/2 - The third matching index should replace the current answer with probability
1/3 - The fourth matching index should replace the current answer with probability
1/4
and so on.
This is reservoir sampling.
The remarkable property is that after processing all matching indices, every valid index has exactly the same probability of being selected.
This allows:
O(1)extra space- Uniform random selection
- Only one pass through the array
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(1) per pick after O(n) preprocessing | O(n) | Stores all indices in a hash map |
| Optimal | O(n) per pick | O(1) | Uses reservoir sampling without extra storage |
Algorithm Walkthrough
Reservoir Sampling Algorithm
- Store the original array inside the class.
We need access to the numbers during future pick calls, so we keep the array as an instance variable.
2. Initialize two variables during pick(target).
counttracks how many times we have seen the target so far.chosen_indexstores the currently selected answer.
- Scan through the array from left to right.
For every index i:
- If
nums[i] != target, skip it. - Otherwise, we found another valid candidate.
- Increment the occurrence counter.
If this is the kth occurrence of the target, then:
count = k
- Randomly decide whether to replace the current answer.
Generate a random integer in the range:
[1, count]
If the generated number equals 1, replace the current answer with the current index.
This means:
- First occurrence selected with probability
1 - Second occurrence replaces previous with probability
1/2 - Third occurrence replaces previous with probability
1/3
and so forth. 6. Continue scanning until the end.
After the scan finishes, return chosen_index.
Why it works
Reservoir sampling guarantees uniform probability.
Suppose there are m occurrences of the target.
Consider any particular occurrence.
- It has probability
1/kof being selected when first encountered. - It then survives future replacement attempts with the correct probabilities.
The math simplifies so that every occurrence ends with probability exactly:
1/m
Therefore, every valid index is equally likely to be returned.
Python Solution
import random
from typing import List
class Solution:
def __init__(self, nums: List[int]):
self.nums = nums
def pick(self, target: int) -> int:
count = 0
chosen_index = -1
for index, value in enumerate(self.nums):
if value == target:
count += 1
# Replace current choice with probability 1/count
if random.randint(1, count) == 1:
chosen_index = index
return chosen_index
# Your Solution object will be instantiated and called as such:
# obj = Solution(nums)
# param_1 = obj.pick(target)
The constructor simply stores the input array so it can be reused during future queries.
Inside pick, we iterate through the array and process only positions where the value equals the target.
The variable count tracks how many valid occurrences have been seen so far. Each time we encounter another valid index, we randomly decide whether to replace the current answer.
The expression:
random.randint(1, count) == 1
ensures the current index is chosen with probability 1/count.
By the end of the traversal, every valid index has equal probability of being stored in chosen_index.
The implementation uses constant extra space because we never store all matching indices.
Go Solution
package main
import (
"math/rand"
)
type Solution struct {
nums []int
}
func Constructor(nums []int) Solution {
return Solution{
nums: nums,
}
}
func (this *Solution) Pick(target int) int {
count := 0
chosenIndex := -1
for index, value := range this.nums {
if value == target {
count++
// Replace current choice with probability 1/count
if rand.Intn(count) == 0 {
chosenIndex = index
}
}
}
return chosenIndex
}
/**
* Your Solution object will be instantiated and called as such:
* obj := Constructor(nums);
* param_1 := obj.Pick(target);
*/
The Go implementation follows exactly the same logic as the Python version.
One small difference is the random API:
rand.Intn(count)
returns a random integer in the range:
[0, count-1]
So checking:
== 0
gives probability 1/count.
Slices in Go naturally handle dynamic array storage, so we simply store the input slice inside the struct.
Worked Examples
Example 1
nums = [1, 2, 3, 3, 3]
pick(3)
Valid indices are:
2, 3, 4
Suppose the random choices proceed as follows.
| Step | Index | Value | count | Random Decision | chosen_index |
|---|---|---|---|---|---|
| Start | - | - | 0 | - | -1 |
| 0 | 0 | 1 | 0 | Skip | -1 |
| 1 | 1 | 2 | 0 | Skip | -1 |
| 2 | 2 | 3 | 1 | Select automatically | 2 |
| 3 | 3 | 3 | 2 | Replace with probability 1/2 | 3 |
| 4 | 4 | 3 | 3 | Keep previous | 3 |
Final answer:
3
Another execution could return 2 or 4.
Each has probability 1/3.
Example 2
nums = [1, 2, 3, 3, 3]
pick(1)
Only one matching index exists.
| Step | Index | Value | count | chosen_index |
|---|---|---|---|---|
| 0 | 0 | 1 | 1 | 0 |
| 1 | 1 | 2 | 1 | 0 |
| 2 | 2 | 3 | 1 | 0 |
| 3 | 3 | 3 | 1 | 0 |
| 4 | 4 | 3 | 1 | 0 |
Final answer:
0
Since only one valid index exists, it is always returned.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each pick scans the entire array once |
| Space | O(1) | Only a few variables are used |
The algorithm processes every element during each pick call, so the runtime is linear in the size of the array.
The extra memory usage remains constant because we never build auxiliary data structures proportional to input size.
Test Cases
import random
# Basic example with duplicates
obj = Solution([1, 2, 3, 3, 3])
result = obj.pick(3)
assert result in [2, 3, 4] # target appears multiple times
# Single occurrence
obj = Solution([1, 2, 3])
assert obj.pick(2) == 1 # only one valid index
# All identical values
obj = Solution([5, 5, 5, 5])
result = obj.pick(5)
assert result in [0, 1, 2, 3] # every index valid
# Negative numbers
obj = Solution([-1, -2, -1, -3])
result = obj.pick(-1)
assert result in [0, 2] # handles negative values
# Array of length 1
obj = Solution([10])
assert obj.pick(10) == 0 # smallest possible input
# Large duplicate block
nums = [7] * 100
obj = Solution(nums)
result = obj.pick(7)
assert 0 <= result < 100 # many duplicates
# Mixed values
obj = Solution([4, 1, 4, 2, 4, 3])
result = obj.pick(4)
assert result in [0, 2, 4] # multiple scattered duplicates
# Target at the end
obj = Solution([1, 2, 3, 4])
assert obj.pick(4) == 3 # last index
# Target at the beginning
obj = Solution([9, 1, 2, 3])
assert obj.pick(9) == 0 # first index
Test Summary
| Test | Why |
|---|---|
[1,2,3,3,3] with target 3 |
Verifies multiple valid indices |
[1,2,3] with target 2 |
Verifies single occurrence |
[5,5,5,5] |
Verifies all elements identical |
| Negative numbers | Ensures signed integers handled correctly |
| Single element array | Minimum input size |
| Large duplicate block | Stress test with many duplicates |
| Scattered duplicates | Ensures non-contiguous matches work |
| Target at end | Boundary position test |
| Target at beginning | Boundary position test |
Edge Cases
Target Appears Only Once
A common bug is accidentally introducing randomness even when there is only one valid index.
For example:
nums = [1, 2, 3]
target = 2
The answer must always be index 1.
The reservoir sampling implementation handles this naturally because the first occurrence is selected with probability 1, and there are no later occurrences that could replace it.
All Elements Are the Same
Consider:
nums = [5, 5, 5, 5]
Every index must have equal probability.
A buggy implementation might accidentally bias toward earlier or later indices.
Reservoir sampling avoids this because each occurrence replaces the previous answer with exactly the correct probability.
Negative Numbers
Some implementations incorrectly assume values are non-negative or use values as array indices.
For example:
nums = [-1, -2, -1]
Our solution only compares values for equality and never uses them as indices, so negative numbers work without any special handling.