LeetCode 398 - Random Pick Index

Difficulty: 🟡 Medium
Topics: Hash Table, Math, Reservoir Sampling, Randomized

Solution

Problem Understanding

The problem gives us an integer array nums that may contain duplicate values. We must design a class that supports repeatedly selecting a random index for a given target value.

The important requirement is that if the target appears multiple times, every valid index must have the same probability of being returned.

For example, if:

nums = [1, 2, 3, 3, 3]

and we call:

pick(3)

then the valid indices are:

2, 3, 4

Each of these indices must be returned with probability 1/3.

The constructor receives the full array once, and then the pick method may be called many times afterward. The problem guarantees that the target always exists in the array, so we never need to handle a missing target.

The constraints are important:

nums.length can be up to 2 * 10^4
pick can be called up to 10^4 times

These limits are not extremely large, but they are large enough that repeatedly scanning the entire array for every query could become inefficient.

Another important detail is that the problem specifically lists "Reservoir Sampling" as one of the topics. That strongly suggests the intended optimal solution should use reservoir sampling, especially because this technique allows uniform random selection without storing all matching indices.

Some edge cases that can cause bugs are:

The target appears exactly once
Every element in the array is identical
The target appears many times
Negative numbers are present
Random selection must remain uniformly distributed

The biggest logical challenge is ensuring equal probability among all matching indices.

Approaches

Brute Force Approach

The most direct solution is to preprocess the array and store all indices for every value.

We can use a hash map where:

value -> list of indices

For example:

nums = [1, 2, 3, 3, 3]

would become:

{
    1: [0],
    2: [1],
    3: [2, 3, 4]
}

Then when pick(target) is called:

Retrieve the list of indices for that target
Randomly choose one index from the list
Return it

This approach is correct because every valid index is explicitly stored, making uniform random selection straightforward.

However, it uses additional memory proportional to the size of the array. While acceptable for these constraints, the problem is specifically designed to encourage a more elegant reservoir sampling solution.

Optimal Approach, Reservoir Sampling

The key insight is that we do not actually need to store all matching indices.

Instead, while scanning the array, we can maintain a current candidate answer and update it with carefully chosen probabilities.

Suppose we encounter matching indices one by one.

The first matching index should be selected with probability 1
The second matching index should replace the first with probability 1/2
The third matching index should replace the current answer with probability 1/3
The fourth matching index should replace the current answer with probability 1/4

and so on.

This is reservoir sampling.

The remarkable property is that after processing all matching indices, every valid index has exactly the same probability of being selected.

This allows:

O(1) extra space
Uniform random selection
Only one pass through the array

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(1) per pick after O(n) preprocessing	O(n)	Stores all indices in a hash map
Optimal	O(n) per pick	O(1)	Uses reservoir sampling without extra storage

Algorithm Walkthrough

Reservoir Sampling Algorithm

Store the original array inside the class.

We need access to the numbers during future pick calls, so we keep the array as an instance variable. 2. Initialize two variables during pick(target).

count tracks how many times we have seen the target so far.
chosen_index stores the currently selected answer.

Scan through the array from left to right.

For every index i:

If nums[i] != target, skip it.
Otherwise, we found another valid candidate.

Increment the occurrence counter.

If this is the kth occurrence of the target, then:

count = k

Randomly decide whether to replace the current answer.

Generate a random integer in the range:

[1, count]

If the generated number equals 1, replace the current answer with the current index.

This means:

First occurrence selected with probability 1
Second occurrence replaces previous with probability 1/2
Third occurrence replaces previous with probability 1/3

and so forth. 6. Continue scanning until the end.

After the scan finishes, return chosen_index.

Why it works

Reservoir sampling guarantees uniform probability.

Suppose there are m occurrences of the target.

Consider any particular occurrence.

It has probability 1/k of being selected when first encountered.
It then survives future replacement attempts with the correct probabilities.

The math simplifies so that every occurrence ends with probability exactly:

1/m

Therefore, every valid index is equally likely to be returned.

Python Solution

import random
from typing import List

class Solution:

    def __init__(self, nums: List[int]):
        self.nums = nums

    def pick(self, target: int) -> int:
        count = 0
        chosen_index = -1

        for index, value in enumerate(self.nums):
            if value == target:
                count += 1

                # Replace current choice with probability 1/count
                if random.randint(1, count) == 1:
                    chosen_index = index

        return chosen_index

# Your Solution object will be instantiated and called as such:
# obj = Solution(nums)
# param_1 = obj.pick(target)

The constructor simply stores the input array so it can be reused during future queries.

Inside pick, we iterate through the array and process only positions where the value equals the target.

The variable count tracks how many valid occurrences have been seen so far. Each time we encounter another valid index, we randomly decide whether to replace the current answer.

The expression:

random.randint(1, count) == 1

ensures the current index is chosen with probability 1/count.

By the end of the traversal, every valid index has equal probability of being stored in chosen_index.

The implementation uses constant extra space because we never store all matching indices.

Go Solution

package main

import (
	"math/rand"
)

type Solution struct {
	nums []int
}

func Constructor(nums []int) Solution {
	return Solution{
		nums: nums,
	}
}

func (this *Solution) Pick(target int) int {
	count := 0
	chosenIndex := -1

	for index, value := range this.nums {
		if value == target {
			count++

			// Replace current choice with probability 1/count
			if rand.Intn(count) == 0 {
				chosenIndex = index
			}
		}
	}

	return chosenIndex
}

/**
 * Your Solution object will be instantiated and called as such:
 * obj := Constructor(nums);
 * param_1 := obj.Pick(target);
 */

The Go implementation follows exactly the same logic as the Python version.

One small difference is the random API:

rand.Intn(count)

returns a random integer in the range:

[0, count-1]

So checking:

== 0

gives probability 1/count.

Slices in Go naturally handle dynamic array storage, so we simply store the input slice inside the struct.

Worked Examples

Example 1

nums = [1, 2, 3, 3, 3]
pick(3)

Valid indices are:

2, 3, 4

Suppose the random choices proceed as follows.

Step	Index	Value	count	Random Decision	chosen_index
Start	-	-	0	-	-1
0	0	1	0	Skip	-1
1	1	2	0	Skip	-1
2	2	3	1	Select automatically	2
3	3	3	2	Replace with probability 1/2	3
4	4	3	3	Keep previous	3

Final answer:

Another execution could return 2 or 4.

Each has probability 1/3.

Example 2

nums = [1, 2, 3, 3, 3]
pick(1)

Only one matching index exists.

Step	Index	Value	count
0	0	1	1
1	1	2	1
2	2	3	1
3	3	3	1
4	4	3	1

Final answer:

Since only one valid index exists, it is always returned.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each `pick` scans the entire array once
Space	O(1)	Only a few variables are used

The algorithm processes every element during each pick call, so the runtime is linear in the size of the array.

The extra memory usage remains constant because we never build auxiliary data structures proportional to input size.

Test Cases

import random

# Basic example with duplicates
obj = Solution([1, 2, 3, 3, 3])
result = obj.pick(3)
assert result in [2, 3, 4]  # target appears multiple times

# Single occurrence
obj = Solution([1, 2, 3])
assert obj.pick(2) == 1  # only one valid index

# All identical values
obj = Solution([5, 5, 5, 5])
result = obj.pick(5)
assert result in [0, 1, 2, 3]  # every index valid

# Negative numbers
obj = Solution([-1, -2, -1, -3])
result = obj.pick(-1)
assert result in [0, 2]  # handles negative values

# Array of length 1
obj = Solution([10])
assert obj.pick(10) == 0  # smallest possible input

# Large duplicate block
nums = [7] * 100
obj = Solution(nums)
result = obj.pick(7)
assert 0 <= result < 100  # many duplicates

# Mixed values
obj = Solution([4, 1, 4, 2, 4, 3])
result = obj.pick(4)
assert result in [0, 2, 4]  # multiple scattered duplicates

# Target at the end
obj = Solution([1, 2, 3, 4])
assert obj.pick(4) == 3  # last index

# Target at the beginning
obj = Solution([9, 1, 2, 3])
assert obj.pick(9) == 0  # first index

Test Summary

Test	Why
`[1,2,3,3,3]` with target `3`	Verifies multiple valid indices
`[1,2,3]` with target `2`	Verifies single occurrence
`[5,5,5,5]`	Verifies all elements identical
Negative numbers	Ensures signed integers handled correctly
Single element array	Minimum input size
Large duplicate block	Stress test with many duplicates
Scattered duplicates	Ensures non-contiguous matches work
Target at end	Boundary position test
Target at beginning	Boundary position test

Edge Cases

Target Appears Only Once

A common bug is accidentally introducing randomness even when there is only one valid index.

For example:

nums = [1, 2, 3]
target = 2

The answer must always be index 1.

The reservoir sampling implementation handles this naturally because the first occurrence is selected with probability 1, and there are no later occurrences that could replace it.

All Elements Are the Same

Consider:

nums = [5, 5, 5, 5]

Every index must have equal probability.

A buggy implementation might accidentally bias toward earlier or later indices.

Reservoir sampling avoids this because each occurrence replaces the previous answer with exactly the correct probability.

Negative Numbers

Some implementations incorrectly assume values are non-negative or use values as array indices.

For example:

nums = [-1, -2, -1]

Our solution only compares values for equality and never uses them as indices, so negative numbers work without any special handling.