LeetCode 381 - Insert Delete GetRandom O(1) - Duplicates allowed

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Math, Design, Randomized

Solution

Problem Understanding

The problem asks us to design a data structure called RandomizedCollection that behaves like a multiset. Unlike a normal set, duplicate values are allowed. The data structure must support three operations:

Insert a value
Remove one occurrence of a value
Return a random element

The critical requirement is that every operation must run in average O(1) time complexity.

The random behavior is especially important. If a value appears multiple times in the collection, it must be more likely to be returned by getRandom(). For example, if the collection contains [1,1,2], then 1 should be returned with probability 2/3, while 2 should be returned with probability 1/3.

The input consists of a sequence of method calls on the RandomizedCollection object. Each operation modifies the internal state of the collection or returns a value. The output reflects the return value of each operation.

The constraints tell us several important things:

Up to 2 * 10^5 operations will be performed.
Values can be very large or very small integers.
getRandom() is only called when the collection is non-empty.

The operation count is large enough that any linear-time operation would be too slow. For example, scanning the collection during every removal would lead to worst-case quadratic behavior. This immediately suggests that we need hash-based structures and careful indexing.

Several edge cases are important:

Inserting duplicates repeatedly
Removing only one occurrence when duplicates exist
Removing the last remaining occurrence of a value
Removing values that are not present
Updating indices correctly after swapping elements during deletion
Ensuring getRandom() reflects duplicate frequencies naturally

A naive implementation often fails when handling duplicate indices during removal. Correctly maintaining index mappings is the core challenge of this problem.

Approaches

Brute Force Approach

A straightforward implementation would use a dynamic array or list to store all values.

Insertion is easy because we can append the value to the end of the list. getRandom() is also simple because we can choose a random index from the array.

The difficulty appears during removal. To remove a value, we must search through the array to find an occurrence of the target value. In the worst case, this requires scanning the entire collection, which costs O(n) time.

Although this approach is logically correct, it violates the requirement that all operations run in average O(1) time.

Optimal Approach

The key insight is that arrays provide efficient random access and random selection, while hash maps provide efficient lookup.

We combine both structures:

A dynamic array stores all elements.
A hash map tracks all indices where each value appears.

The challenge is deletion. Removing an element from the middle of an array normally costs O(n) because elements must shift left. We avoid this by swapping the target element with the last element in the array, then removing the last position.

To support duplicates, each value maps to a set of indices rather than a single index.

This design allows:

insert in O(1)
remove in average O(1)
getRandom in O(1)

because all operations involve only hash lookups, set updates, array appends, and removing from the end of the array.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	Insert `O(1)`, Remove `O(n)`, Random `O(1)`	`O(n)`	Uses only an array, removal requires linear scan
Optimal	Average `O(1)` for all operations	`O(n)`	Uses array plus hash map of index sets

Algorithm Walkthrough

Maintain a dynamic array called values.

This array stores every element currently in the collection, including duplicates. Using an array gives two important advantages:

Appending is O(1)
Random index selection is O(1)

Maintain a hash map called indices.

The key is a value from the collection.

The value is a set containing every index where that number appears inside values.

Example:

values  = [1, 1, 2]
indices = {
    1 -> {0, 1},
    2 -> {2}
}

During insertion:

Check whether the value already exists in the map and still has indices.
Append the new value to the end of values.
Add the new index into the corresponding set in indices.
Return true if this was the first occurrence.

During removal:

First check whether the value exists.
If no occurrence exists, return false.
Otherwise, remove one index from the value's index set.

Suppose we remove index remove_index. 5. To avoid shifting elements:

Take the last element from the array.
Move that last element into remove_index.
Update the moved element's index set:
Remove the old last index
Add the new index

Remove the final array element.

Since the unwanted element is now at the end, removing it is O(1). 7. If a value no longer has any indices, remove it from the hash map. 8. During getRandom():

Generate a random index from 0 to len(values) - 1
Return the element at that index

Because duplicates occupy multiple array positions, they naturally receive proportionally higher probability.

Why it works

The central invariant is that every element's indices inside the array are accurately tracked in the hash map.

Whenever an element moves because of a swap during removal, we immediately update its recorded indices. Therefore, the hash map and array always remain synchronized.

Since getRandom() selects uniformly among array positions, and duplicates occupy multiple positions, the probability distribution automatically matches the required frequencies.

Python Solution

import random
from collections import defaultdict
from typing import Dict, List, Set

class RandomizedCollection:

    def __init__(self):
        self.values: List[int] = []
        self.indices: Dict[int, Set[int]] = defaultdict(set)

    def insert(self, val: int) -> bool:
        is_new = len(self.indices[val]) == 0

        self.values.append(val)
        self.indices[val].add(len(self.values) - 1)

        return is_new

    def remove(self, val: int) -> bool:
        if not self.indices[val]:
            return False

        remove_index = self.indices[val].pop()

        last_value = self.values[-1]
        last_index = len(self.values) - 1

        self.values[remove_index] = last_value

        if remove_index != last_index:
            self.indices[last_value].remove(last_index)
            self.indices[last_value].add(remove_index)

        self.values.pop()

        if not self.indices[val]:
            del self.indices[val]

        return True

    def getRandom(self) -> int:
        return random.choice(self.values)

# Your RandomizedCollection object will be instantiated and called as such:
# obj = RandomizedCollection()
# param_1 = obj.insert(val)
# param_2 = obj.remove(val)
# param_3 = obj.getRandom()

The implementation uses two core structures.

self.values stores all elements currently in the collection. Duplicates appear multiple times. This array enables constant-time append operations and efficient random selection.

self.indices maps each value to a set of positions where it appears in the array. A set is used because insertion and deletion from a set are average O(1) operations.

During insertion, the code first checks whether the value already exists by examining whether its index set is empty. The value is appended to the array, and its new index is added to the set.

The removal logic is the most important section. One index associated with the target value is removed from the set. The last array element is then moved into the vacated position. If the removed element was not already the last element, the moved value's indices must be updated carefully.

Finally, the array's last element is popped. If a value no longer exists anywhere in the collection, its entry is deleted from the hash map.

getRandom() simply delegates to random.choice, which selects uniformly from the underlying array.

Go Solution

package main

import (
	"math/rand"
)

type RandomizedCollection struct {
	values  []int
	indices map[int]map[int]struct{}
}

func Constructor() RandomizedCollection {
	return RandomizedCollection{
		values:  []int{},
		indices: make(map[int]map[int]struct{}),
	}
}

func (this *RandomizedCollection) Insert(val int) bool {
	_, exists := this.indices[val]

	if !exists {
		this.indices[val] = make(map[int]struct{})
	}

	this.values = append(this.values, val)
	index := len(this.values) - 1
	this.indices[val][index] = struct{}{}

	return !exists
}

func (this *RandomizedCollection) Remove(val int) bool {
	indexSet, exists := this.indices[val]

	if !exists || len(indexSet) == 0 {
		return false
	}

	var removeIndex int
	for idx := range indexSet {
		removeIndex = idx
		break
	}

	delete(this.indices[val], removeIndex)

	lastIndex := len(this.values) - 1
	lastValue := this.values[lastIndex]

	this.values[removeIndex] = lastValue

	if removeIndex != lastIndex {
		delete(this.indices[lastValue], lastIndex)
		this.indices[lastValue][removeIndex] = struct{}{}
	}

	this.values = this.values[:lastIndex]

	if len(this.indices[val]) == 0 {
		delete(this.indices, val)
	}

	return true
}

func (this *RandomizedCollection) GetRandom() int {
	randomIndex := rand.Intn(len(this.values))
	return this.values[randomIndex]
}

/**
 * Your RandomizedCollection object will be instantiated and called as such:
 * obj := Constructor();
 * param_1 := obj.Insert(val);
 * param_2 := obj.Remove(val);
 * param_3 := obj.GetRandom();
 */

The Go implementation follows the same algorithmic structure as the Python version, but there are several language-specific differences.

Go does not have a built-in set type, so sets are implemented using map[int]struct{}. The empty struct consumes zero storage and is commonly used for this pattern.

Unlike Python's set.pop(), Go maps do not support direct arbitrary removal. We iterate once through the map to retrieve any index.

Slice truncation is used to remove the last element from the array:

this.values = this.values[:lastIndex]

Random selection uses rand.Intn, which generates a random integer in the desired range.

Worked Examples

Example 1

Operations:

insert(1)
insert(1)
insert(2)
getRandom()
remove(1)
getRandom()

Initial state:

Step	values	indices
Start	`[]`	`{}`

insert(1)

Append 1
Store index 0

Step	values	indices	Return
insert(1)	`[1]`	`{1: {0}}`	`true`

insert(1)

Append another 1
Store index 1

Step	values	indices	Return
insert(1)	`[1,1]`	`{1: {0,1}}`	`false`

insert(2)

Append 2
Store index 2

Step	values	indices	Return
insert(2)	`[1,1,2]`	`{1: {0,1}, 2: {2}}`	`true`

getRandom()

Possible outcomes:

Value	Probability
1	2/3
2	1/3

This works because 1 occupies two array positions.

remove(1)

Suppose index 0 is removed.

Current array:

[1,1,2]

Swap last element 2 into index 0.

Intermediate state:

[2,1,2]

Update indices:

1 -> {1}
2 -> {0}

Pop last element:

[2,1]

Final state:

Step	values	indices	Return
remove(1)	`[2,1]`	`{1: {1}, 2: {0}}`	`true`

getRandom()

Now both values appear once.

Value	Probability
1	1/2
2	1/2

Complexity Analysis

Measure	Complexity	Explanation
Time	Average `O(1)`	Hash operations, set updates, append, pop, and random indexing are constant time on average
Space	`O(n)`	The array and index map together store all inserted elements

The time complexity relies on average constant-time behavior of hash tables and sets. Removal avoids expensive shifting by swapping with the last array element. Since every element and every index is stored exactly once across the data structures, total space usage is linear in the number of stored elements.

Test Cases

# Basic example from the problem statement
rc = RandomizedCollection()
assert rc.insert(1) is True      # first insertion
assert rc.insert(1) is False     # duplicate insertion
assert rc.insert(2) is True      # new value
assert rc.remove(1) is True      # remove one occurrence

# Removing non-existent value
rc = RandomizedCollection()
assert rc.remove(10) is False    # value does not exist

# Insert and remove single value
rc = RandomizedCollection()
assert rc.insert(5) is True
assert rc.remove(5) is True
assert rc.remove(5) is False     # already removed

# Multiple duplicates
rc = RandomizedCollection()
assert rc.insert(7) is True
assert rc.insert(7) is False
assert rc.insert(7) is False
assert rc.remove(7) is True
assert rc.remove(7) is True
assert rc.remove(7) is True
assert rc.remove(7) is False     # all copies removed

# Removing element that is not at the end
rc = RandomizedCollection()
assert rc.insert(1) is True
assert rc.insert(2) is True
assert rc.insert(3) is True
assert rc.remove(2) is True      # tests swap update logic

# Negative values
rc = RandomizedCollection()
assert rc.insert(-1) is True
assert rc.insert(-1) is False
assert rc.remove(-1) is True

# Large integer values
rc = RandomizedCollection()
large = 2**31 - 1
small = -2**31
assert rc.insert(large) is True
assert rc.insert(small) is True
assert rc.remove(large) is True

# Random retrieval should always return existing values
rc = RandomizedCollection()
rc.insert(1)
rc.insert(2)
rc.insert(2)

for _ in range(100):
    assert rc.getRandom() in {1, 2}

# Stress duplicate handling
rc = RandomizedCollection()

for _ in range(1000):
    rc.insert(1)

for _ in range(1000):
    assert rc.remove(1) is True

assert rc.remove(1) is False

Test	Why
Basic example	Verifies normal insert, remove, and random behavior
Removing non-existent value	Ensures failed removals return `False`
Insert and remove single value	Tests cleanup after final removal
Multiple duplicates	Verifies duplicate tracking correctness
Removing middle element	Validates swap-and-update logic
Negative values	Ensures signed integers work correctly
Large integer values	Tests boundary integer constraints
Random retrieval	Confirms returned values are valid
Stress duplicate handling	Ensures index bookkeeping remains correct under heavy duplication

Edge Cases

One important edge case occurs when removing a value that appears multiple times. A buggy implementation may accidentally remove all occurrences instead of only one. This solution avoids that problem because each removal operates on exactly one stored index from the value's index set.

Another tricky case happens when the element being removed is already the last element in the array. In this scenario, performing the swap logic blindly can corrupt index tracking. The implementation specifically checks whether remove_index != last_index before updating moved indices, preventing unnecessary modifications.

A third important edge case involves deleting the final remaining occurrence of a value. If the hash map entry is not removed afterward, future operations may incorrectly think the value still exists. The implementation checks whether the index set becomes empty and deletes the map entry completely.

A subtle but critical issue arises when duplicates are swapped during removal. Consider removing one 1 while another 1 remains at the end of the array. Incorrect index updates can accidentally erase the remaining occurrence. The algorithm carefully updates only the moved element's indices, preserving all valid positions.

Finally, random selection must correctly account for duplicates. Some naive implementations attempt to randomize over unique values, which produces incorrect probabilities. By storing every occurrence directly in the array, duplicate frequencies are naturally reflected in the random distribution.