LeetCode 381 - Insert Delete GetRandom O(1) - Duplicates allowed
The problem asks us to design a data structure called RandomizedCollection that behaves like a multiset. Unlike a normal set, duplicate values are allowed.
Difficulty: 🔴 Hard
Topics: Array, Hash Table, Math, Design, Randomized
Solution
Problem Understanding
The problem asks us to design a data structure called RandomizedCollection that behaves like a multiset. Unlike a normal set, duplicate values are allowed. The data structure must support three operations:
- Insert a value
- Remove one occurrence of a value
- Return a random element
The critical requirement is that every operation must run in average O(1) time complexity.
The random behavior is especially important. If a value appears multiple times in the collection, it must be more likely to be returned by getRandom(). For example, if the collection contains [1,1,2], then 1 should be returned with probability 2/3, while 2 should be returned with probability 1/3.
The input consists of a sequence of method calls on the RandomizedCollection object. Each operation modifies the internal state of the collection or returns a value. The output reflects the return value of each operation.
The constraints tell us several important things:
- Up to
2 * 10^5operations will be performed. - Values can be very large or very small integers.
getRandom()is only called when the collection is non-empty.
The operation count is large enough that any linear-time operation would be too slow. For example, scanning the collection during every removal would lead to worst-case quadratic behavior. This immediately suggests that we need hash-based structures and careful indexing.
Several edge cases are important:
- Inserting duplicates repeatedly
- Removing only one occurrence when duplicates exist
- Removing the last remaining occurrence of a value
- Removing values that are not present
- Updating indices correctly after swapping elements during deletion
- Ensuring
getRandom()reflects duplicate frequencies naturally
A naive implementation often fails when handling duplicate indices during removal. Correctly maintaining index mappings is the core challenge of this problem.
Approaches
Brute Force Approach
A straightforward implementation would use a dynamic array or list to store all values.
Insertion is easy because we can append the value to the end of the list. getRandom() is also simple because we can choose a random index from the array.
The difficulty appears during removal. To remove a value, we must search through the array to find an occurrence of the target value. In the worst case, this requires scanning the entire collection, which costs O(n) time.
Although this approach is logically correct, it violates the requirement that all operations run in average O(1) time.
Optimal Approach
The key insight is that arrays provide efficient random access and random selection, while hash maps provide efficient lookup.
We combine both structures:
- A dynamic array stores all elements.
- A hash map tracks all indices where each value appears.
The challenge is deletion. Removing an element from the middle of an array normally costs O(n) because elements must shift left. We avoid this by swapping the target element with the last element in the array, then removing the last position.
To support duplicates, each value maps to a set of indices rather than a single index.
This design allows:
insertinO(1)removein averageO(1)getRandominO(1)
because all operations involve only hash lookups, set updates, array appends, and removing from the end of the array.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | Insert O(1), Remove O(n), Random O(1) |
O(n) |
Uses only an array, removal requires linear scan |
| Optimal | Average O(1) for all operations |
O(n) |
Uses array plus hash map of index sets |
Algorithm Walkthrough
- Maintain a dynamic array called
values.
This array stores every element currently in the collection, including duplicates. Using an array gives two important advantages:
- Appending is
O(1) - Random index selection is
O(1)
- Maintain a hash map called
indices.
The key is a value from the collection.
The value is a set containing every index where that number appears inside values.
Example:
values = [1, 1, 2]
indices = {
1 -> {0, 1},
2 -> {2}
}
- During insertion:
- Check whether the value already exists in the map and still has indices.
- Append the new value to the end of
values. - Add the new index into the corresponding set in
indices. - Return
trueif this was the first occurrence.
- During removal:
- First check whether the value exists.
- If no occurrence exists, return
false. - Otherwise, remove one index from the value's index set.
Suppose we remove index remove_index.
5. To avoid shifting elements:
-
Take the last element from the array.
-
Move that last element into
remove_index. -
Update the moved element's index set:
-
Remove the old last index
-
Add the new index
- Remove the final array element.
Since the unwanted element is now at the end, removing it is O(1).
7. If a value no longer has any indices, remove it from the hash map.
8. During getRandom():
- Generate a random index from
0tolen(values) - 1 - Return the element at that index
Because duplicates occupy multiple array positions, they naturally receive proportionally higher probability.
Why it works
The central invariant is that every element's indices inside the array are accurately tracked in the hash map.
Whenever an element moves because of a swap during removal, we immediately update its recorded indices. Therefore, the hash map and array always remain synchronized.
Since getRandom() selects uniformly among array positions, and duplicates occupy multiple positions, the probability distribution automatically matches the required frequencies.
Python Solution
import random
from collections import defaultdict
from typing import Dict, List, Set
class RandomizedCollection:
def __init__(self):
self.values: List[int] = []
self.indices: Dict[int, Set[int]] = defaultdict(set)
def insert(self, val: int) -> bool:
is_new = len(self.indices[val]) == 0
self.values.append(val)
self.indices[val].add(len(self.values) - 1)
return is_new
def remove(self, val: int) -> bool:
if not self.indices[val]:
return False
remove_index = self.indices[val].pop()
last_value = self.values[-1]
last_index = len(self.values) - 1
self.values[remove_index] = last_value
if remove_index != last_index:
self.indices[last_value].remove(last_index)
self.indices[last_value].add(remove_index)
self.values.pop()
if not self.indices[val]:
del self.indices[val]
return True
def getRandom(self) -> int:
return random.choice(self.values)
# Your RandomizedCollection object will be instantiated and called as such:
# obj = RandomizedCollection()
# param_1 = obj.insert(val)
# param_2 = obj.remove(val)
# param_3 = obj.getRandom()
The implementation uses two core structures.
self.values stores all elements currently in the collection. Duplicates appear multiple times. This array enables constant-time append operations and efficient random selection.
self.indices maps each value to a set of positions where it appears in the array. A set is used because insertion and deletion from a set are average O(1) operations.
During insertion, the code first checks whether the value already exists by examining whether its index set is empty. The value is appended to the array, and its new index is added to the set.
The removal logic is the most important section. One index associated with the target value is removed from the set. The last array element is then moved into the vacated position. If the removed element was not already the last element, the moved value's indices must be updated carefully.
Finally, the array's last element is popped. If a value no longer exists anywhere in the collection, its entry is deleted from the hash map.
getRandom() simply delegates to random.choice, which selects uniformly from the underlying array.
Go Solution
package main
import (
"math/rand"
)
type RandomizedCollection struct {
values []int
indices map[int]map[int]struct{}
}
func Constructor() RandomizedCollection {
return RandomizedCollection{
values: []int{},
indices: make(map[int]map[int]struct{}),
}
}
func (this *RandomizedCollection) Insert(val int) bool {
_, exists := this.indices[val]
if !exists {
this.indices[val] = make(map[int]struct{})
}
this.values = append(this.values, val)
index := len(this.values) - 1
this.indices[val][index] = struct{}{}
return !exists
}
func (this *RandomizedCollection) Remove(val int) bool {
indexSet, exists := this.indices[val]
if !exists || len(indexSet) == 0 {
return false
}
var removeIndex int
for idx := range indexSet {
removeIndex = idx
break
}
delete(this.indices[val], removeIndex)
lastIndex := len(this.values) - 1
lastValue := this.values[lastIndex]
this.values[removeIndex] = lastValue
if removeIndex != lastIndex {
delete(this.indices[lastValue], lastIndex)
this.indices[lastValue][removeIndex] = struct{}{}
}
this.values = this.values[:lastIndex]
if len(this.indices[val]) == 0 {
delete(this.indices, val)
}
return true
}
func (this *RandomizedCollection) GetRandom() int {
randomIndex := rand.Intn(len(this.values))
return this.values[randomIndex]
}
/**
* Your RandomizedCollection object will be instantiated and called as such:
* obj := Constructor();
* param_1 := obj.Insert(val);
* param_2 := obj.Remove(val);
* param_3 := obj.GetRandom();
*/
The Go implementation follows the same algorithmic structure as the Python version, but there are several language-specific differences.
Go does not have a built-in set type, so sets are implemented using map[int]struct{}. The empty struct consumes zero storage and is commonly used for this pattern.
Unlike Python's set.pop(), Go maps do not support direct arbitrary removal. We iterate once through the map to retrieve any index.
Slice truncation is used to remove the last element from the array:
this.values = this.values[:lastIndex]
Random selection uses rand.Intn, which generates a random integer in the desired range.
Worked Examples
Example 1
Operations:
insert(1)
insert(1)
insert(2)
getRandom()
remove(1)
getRandom()
Initial state:
| Step | values | indices |
|---|---|---|
| Start | [] |
{} |
insert(1)
- Append
1 - Store index
0
| Step | values | indices | Return |
|---|---|---|---|
| insert(1) | [1] |
{1: {0}} |
true |
insert(1)
- Append another
1 - Store index
1
| Step | values | indices | Return |
|---|---|---|---|
| insert(1) | [1,1] |
{1: {0,1}} |
false |
insert(2)
- Append
2 - Store index
2
| Step | values | indices | Return |
|---|---|---|---|
| insert(2) | [1,1,2] |
{1: {0,1}, 2: {2}} |
true |
getRandom()
Possible outcomes:
| Value | Probability |
|---|---|
| 1 | 2/3 |
| 2 | 1/3 |
This works because 1 occupies two array positions.
remove(1)
Suppose index 0 is removed.
Current array:
[1,1,2]
Swap last element 2 into index 0.
Intermediate state:
[2,1,2]
Update indices:
1 -> {1}
2 -> {0}
Pop last element:
[2,1]
Final state:
| Step | values | indices | Return |
|---|---|---|---|
| remove(1) | [2,1] |
{1: {1}, 2: {0}} |
true |
getRandom()
Now both values appear once.
| Value | Probability |
|---|---|
| 1 | 1/2 |
| 2 | 1/2 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | Average O(1) |
Hash operations, set updates, append, pop, and random indexing are constant time on average |
| Space | O(n) |
The array and index map together store all inserted elements |
The time complexity relies on average constant-time behavior of hash tables and sets. Removal avoids expensive shifting by swapping with the last array element. Since every element and every index is stored exactly once across the data structures, total space usage is linear in the number of stored elements.
Test Cases
# Basic example from the problem statement
rc = RandomizedCollection()
assert rc.insert(1) is True # first insertion
assert rc.insert(1) is False # duplicate insertion
assert rc.insert(2) is True # new value
assert rc.remove(1) is True # remove one occurrence
# Removing non-existent value
rc = RandomizedCollection()
assert rc.remove(10) is False # value does not exist
# Insert and remove single value
rc = RandomizedCollection()
assert rc.insert(5) is True
assert rc.remove(5) is True
assert rc.remove(5) is False # already removed
# Multiple duplicates
rc = RandomizedCollection()
assert rc.insert(7) is True
assert rc.insert(7) is False
assert rc.insert(7) is False
assert rc.remove(7) is True
assert rc.remove(7) is True
assert rc.remove(7) is True
assert rc.remove(7) is False # all copies removed
# Removing element that is not at the end
rc = RandomizedCollection()
assert rc.insert(1) is True
assert rc.insert(2) is True
assert rc.insert(3) is True
assert rc.remove(2) is True # tests swap update logic
# Negative values
rc = RandomizedCollection()
assert rc.insert(-1) is True
assert rc.insert(-1) is False
assert rc.remove(-1) is True
# Large integer values
rc = RandomizedCollection()
large = 2**31 - 1
small = -2**31
assert rc.insert(large) is True
assert rc.insert(small) is True
assert rc.remove(large) is True
# Random retrieval should always return existing values
rc = RandomizedCollection()
rc.insert(1)
rc.insert(2)
rc.insert(2)
for _ in range(100):
assert rc.getRandom() in {1, 2}
# Stress duplicate handling
rc = RandomizedCollection()
for _ in range(1000):
rc.insert(1)
for _ in range(1000):
assert rc.remove(1) is True
assert rc.remove(1) is False
| Test | Why |
|---|---|
| Basic example | Verifies normal insert, remove, and random behavior |
| Removing non-existent value | Ensures failed removals return False |
| Insert and remove single value | Tests cleanup after final removal |
| Multiple duplicates | Verifies duplicate tracking correctness |
| Removing middle element | Validates swap-and-update logic |
| Negative values | Ensures signed integers work correctly |
| Large integer values | Tests boundary integer constraints |
| Random retrieval | Confirms returned values are valid |
| Stress duplicate handling | Ensures index bookkeeping remains correct under heavy duplication |
Edge Cases
One important edge case occurs when removing a value that appears multiple times. A buggy implementation may accidentally remove all occurrences instead of only one. This solution avoids that problem because each removal operates on exactly one stored index from the value's index set.
Another tricky case happens when the element being removed is already the last element in the array. In this scenario, performing the swap logic blindly can corrupt index tracking. The implementation specifically checks whether remove_index != last_index before updating moved indices, preventing unnecessary modifications.
A third important edge case involves deleting the final remaining occurrence of a value. If the hash map entry is not removed afterward, future operations may incorrectly think the value still exists. The implementation checks whether the index set becomes empty and deletes the map entry completely.
A subtle but critical issue arises when duplicates are swapped during removal. Consider removing one 1 while another 1 remains at the end of the array. Incorrect index updates can accidentally erase the remaining occurrence. The algorithm carefully updates only the moved element's indices, preserving all valid positions.
Finally, random selection must correctly account for duplicates. Some naive implementations attempt to randomize over unique values, which produces incorrect probabilities. By storing every occurrence directly in the array, duplicate frequencies are naturally reflected in the random distribution.