LeetCode 705 - Design HashSet

The problem asks us to design our own HashSet implementation without using any built in hash table libraries such as Python's set or Go's built in map type as the primary solution idea. A HashSet is a data structure that stores unique values.

LeetCode Problem 705

Difficulty: 🟢 Easy
Topics: Array, Hash Table, Linked List, Design, Hash Function

Solution

Problem Understanding

The problem asks us to design our own HashSet implementation without using any built in hash table libraries such as Python's set or Go's built in map type as the primary solution idea.

A HashSet is a data structure that stores unique values. It supports three operations:

  • add(key), insert a value into the set
  • remove(key), delete a value from the set if it exists
  • contains(key), check whether a value exists in the set

The important property of a set is uniqueness. If the same value is inserted multiple times, it should still appear only once.

The input in LeetCode is represented as a sequence of operations. Each operation is called on the MyHashSet object, and the expected output corresponds to the return value of each operation. Since add and remove do not return anything, their outputs are null. Only contains returns a boolean result.

The constraints are small enough that many solutions could technically pass:

  • 0 <= key <= 10^6
  • At most 10^4 operations

Even though the constraints are not huge, the goal of the problem is to understand how hash based data structures work internally. The intended solution is to build a custom hashing mechanism.

A few important edge cases immediately stand out:

  • Adding the same key multiple times should not create duplicates
  • Removing a key that does not exist should not cause errors
  • Keys can be 0
  • Keys can be as large as 10^6
  • Multiple keys may hash to the same bucket, so collision handling is necessary

These cases are exactly why a proper hash table design matters.

Approaches

Brute Force Approach

The simplest possible solution is to store all values in a list or array.

For add, we first scan the list to check whether the value already exists. If it does not exist, we append it.

For contains, we linearly search through the list.

For remove, we linearly search until we find the value, then delete it.

This approach is straightforward and guarantees correctness because every operation explicitly checks all stored elements. However, it is inefficient because each operation may require scanning the entire collection.

If there are n stored elements, each operation takes O(n) time in the worst case.

Optimal Approach, Hashing with Buckets

The key observation is that we can dramatically reduce lookup time by distributing values into buckets using a hash function.

Instead of storing all elements in one list, we create many smaller lists called buckets.

We compute:

bucket_index = key % bucket_count

All keys that produce the same remainder go into the same bucket.

This means:

  • add only searches inside one bucket
  • remove only searches inside one bucket
  • contains only searches inside one bucket

If the buckets are distributed evenly, each bucket stays very small, making operations close to constant time on average.

To handle collisions, where multiple keys map to the same bucket, we use separate chaining. Each bucket is simply a list storing all keys assigned to that bucket.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n) per operation O(n) Single list storing all values
Optimal O(1) average per operation O(n + buckets) Hashing with separate chaining

Algorithm Walkthrough

Optimal Algorithm

  1. Create a fixed number of buckets.

We initialize an array where each position represents a bucket. Each bucket itself is a list that stores keys hashing to that bucket. 2. Choose a hash function.

We compute the bucket index using:

key % bucket_count

This spreads values across buckets reasonably evenly. 3. Implement add(key).

First compute the correct bucket index. Then look inside that bucket to see whether the key already exists. If not, append it.

This preserves the uniqueness property of a set. 4. Implement remove(key).

Compute the bucket index and search only within that bucket. If the key exists, remove it.

If the key does not exist, do nothing. 5. Implement contains(key).

Compute the bucket index and check whether the key exists in that bucket.

Return True if found, otherwise False.

Why it works

The correctness comes from the fact that every key is always mapped deterministically to exactly one bucket using the same hash function.

Whenever we add, remove, or search for a key, we compute the same bucket index. Therefore, if a key exists in the set, it must exist inside that bucket.

Because we also prevent duplicates during insertion, the structure always behaves like a valid set.

Python Solution

class MyHashSet:

    def __init__(self):
        self.bucket_count = 1000
        self.buckets = [[] for _ in range(self.bucket_count)]

    def _hash(self, key: int) -> int:
        return key % self.bucket_count

    def add(self, key: int) -> None:
        bucket_index = self._hash(key)
        bucket = self.buckets[bucket_index]

        if key not in bucket:
            bucket.append(key)

    def remove(self, key: int) -> None:
        bucket_index = self._hash(key)
        bucket = self.buckets[bucket_index]

        if key in bucket:
            bucket.remove(key)

    def contains(self, key: int) -> bool:
        bucket_index = self._hash(key)
        bucket = self.buckets[bucket_index]

        return key in bucket

# Your MyHashSet object will be instantiated and called as such:
# obj = MyHashSet()
# obj.add(key)
# obj.remove(key)
# param_3 = obj.contains(key)

The implementation begins by creating 1000 buckets. Each bucket is an empty list.

The _hash helper method computes the bucket index using modulo arithmetic. This ensures every key maps consistently to the same bucket.

In the add method, we first locate the bucket corresponding to the key. We then check whether the key already exists before appending it. This guarantees uniqueness.

The remove method also finds the correct bucket first. If the key exists, we remove it from that bucket.

The contains method simply checks whether the key is present inside its bucket.

All operations only work inside a single bucket rather than scanning the entire dataset, which is why hashing improves efficiency.

Go Solution

type MyHashSet struct {
	buckets [][]int
	size    int
}

func Constructor() MyHashSet {
	size := 1000
	buckets := make([][]int, size)

	return MyHashSet{
		buckets: buckets,
		size:    size,
	}
}

func (this *MyHashSet) hash(key int) int {
	return key % this.size
}

func (this *MyHashSet) Add(key int) {
	index := this.hash(key)
	bucket := this.buckets[index]

	for _, value := range bucket {
		if value == key {
			return
		}
	}

	this.buckets[index] = append(this.buckets[index], key)
}

func (this *MyHashSet) Remove(key int) {
	index := this.hash(key)
	bucket := this.buckets[index]

	for i, value := range bucket {
		if value == key {
			this.buckets[index] = append(bucket[:i], bucket[i+1:]...)
			return
		}
	}
}

func (this *MyHashSet) Contains(key int) bool {
	index := this.hash(key)
	bucket := this.buckets[index]

	for _, value := range bucket {
		if value == key {
			return true
		}
	}

	return false
}

/**
 * Your MyHashSet object will be instantiated and called as such:
 * obj := Constructor();
 * obj.Add(key);
 * obj.Remove(key);
 * param_3 := obj.Contains(key);
 */

The Go implementation follows the same overall strategy as the Python version, but there are some language specific differences.

Go does not have Python style list membership checks such as key in bucket, so we manually iterate through the slice.

Removing an element from a slice requires rebuilding the slice using:

append(bucket[:i], bucket[i+1:]...)

Go slices automatically manage dynamic resizing, so we can still efficiently append elements into buckets.

Worked Examples

Example 1

Operations:

["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]

Assume:

bucket_count = 1000

Step by Step Trace

Operation Bucket Index Bucket State Result
add(1) 1 % 1000 = 1 bucket[1] = [1] null
add(2) 2 % 1000 = 2 bucket[2] = [2] null
contains(1) 1 bucket[1] = [1] true
contains(3) 3 bucket[3] = [] false
add(2) 2 bucket[2] already contains 2 null
contains(2) 2 bucket[2] = [2] true
remove(2) 2 bucket[2] = [] null
contains(2) 2 bucket[2] = [] false

Final structure:

bucket[1] = [1]
bucket[2] = []

Complexity Analysis

Measure Complexity Explanation
Time O(1) average Each operation searches only one bucket
Space O(n + buckets) Storage for keys and bucket array

The average time complexity is constant because hashing distributes elements across many buckets. In the worst case, all keys could collide into one bucket, giving O(n) operations, but with a reasonable bucket count and evenly distributed keys, average performance remains close to O(1).

The space complexity includes both the stored keys and the bucket array itself.

Test Cases

# Example case
hash_set = MyHashSet()
hash_set.add(1)
hash_set.add(2)
assert hash_set.contains(1) == True   # Existing key
assert hash_set.contains(3) == False  # Missing key
hash_set.add(2)
assert hash_set.contains(2) == True   # Duplicate add should not change behavior
hash_set.remove(2)
assert hash_set.contains(2) == False  # Key removed correctly

# Removing non existing key
hash_set = MyHashSet()
hash_set.remove(10)
assert hash_set.contains(10) == False  # Should safely do nothing

# Add duplicate values
hash_set = MyHashSet()
hash_set.add(5)
hash_set.add(5)
assert hash_set.contains(5) == True  # Still exists only once

# Boundary value: minimum key
hash_set = MyHashSet()
hash_set.add(0)
assert hash_set.contains(0) == True
hash_set.remove(0)
assert hash_set.contains(0) == False

# Boundary value: maximum key
hash_set = MyHashSet()
hash_set.add(10**6)
assert hash_set.contains(10**6) == True
hash_set.remove(10**6)
assert hash_set.contains(10**6) == False

# Collision handling
hash_set = MyHashSet()

# These collide when bucket_count = 1000
hash_set.add(1)
hash_set.add(1001)
hash_set.add(2001)

assert hash_set.contains(1) == True
assert hash_set.contains(1001) == True
assert hash_set.contains(2001) == True

hash_set.remove(1001)

assert hash_set.contains(1) == True      # Other colliding values remain
assert hash_set.contains(1001) == False
assert hash_set.contains(2001) == True

Test Case Summary

Test Why
Example from prompt Validates standard functionality
Remove missing key Ensures safe deletion
Duplicate insertion Verifies set uniqueness
Minimum key 0 Checks lower boundary
Maximum key 10^6 Checks upper boundary
Collision handling Ensures separate chaining works correctly

Edge Cases

Adding the Same Key Multiple Times

A common bug in set implementations is accidentally storing duplicates. If add(5) is called repeatedly, the set should still contain only one instance of 5.

Our implementation prevents this by checking whether the key already exists in the bucket before appending it.

Removing a Non Existing Key

If remove(key) is called for a key that does not exist, the implementation should not throw errors or modify unrelated data.

Our solution safely searches the bucket first. If the key is absent, nothing happens.

Hash Collisions

Different keys can produce the same bucket index. For example, with 1000 buckets:

1 % 1000 = 1
1001 % 1000 = 1
2001 % 1000 = 1

A buggy implementation might overwrite existing values when collisions occur.

Our solution handles collisions using separate chaining, meaning each bucket stores multiple values in a list. Even if several keys hash to the same bucket, all remain accessible independently.