LeetCode 217: Contains Duplicate
A clear explanation of detecting duplicates in an array using a hash set and sorting.
Problem Restatement
We are given an integer array nums.
We need to determine whether any value appears at least twice.
Return:
| Return value | Meaning |
|---|---|
True |
Some value appears more than once |
False |
Every value is distinct |
The official statement asks us to return true if any value appears at least twice in the array, and false if every element is distinct.
Input and Output
| Item | Meaning |
|---|---|
| Input | Integer array nums |
| Output | Boolean |
True |
Duplicate exists |
False |
All elements are unique |
Example function shape:
def containsDuplicate(nums: list[int]) -> bool:
...
Examples
Example 1:
nums = [1, 2, 3, 1]
The value 1 appears twice.
Answer:
True
Example 2:
nums = [1, 2, 3, 4]
All values are distinct.
Answer:
False
Example 3:
nums = [1, 1, 1, 3, 3, 4, 3, 2, 4, 2]
Several values repeat.
Answer:
True
First Thought
We could compare every pair of elements.
For each index i, check every later index j.
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
This works, but it is slow.
Problem With Brute Force
For an array of length n, the nested loops may compare:
n * (n - 1) / 2
pairs.
That gives:
O(n^2)
time complexity.
We can do much better using a hash set.
Key Insight
A set supports fast membership checking.
While scanning the array:
- if we see a number for the first time, add it to the set
- if the number already exists in the set, we found a duplicate
This lets us detect duplicates in one pass.
Algorithm
- Create an empty set
seen. - Scan the array from left to right.
- For each number:
- If it already exists in
seen, returnTrue. - Otherwise add it to
seen.
- If it already exists in
- If the scan finishes, return
False.
Walkthrough
Use:
nums = [1, 2, 3, 1]
Start:
seen = set()
Process 1:
1 not in seen
Add it:
seen = {1}
Process 2:
seen = {1, 2}
Process 3:
seen = {1, 2, 3}
Process 1 again:
1 already exists in seen
Return:
True
Correctness
The set seen always contains exactly the distinct values encountered earlier in the array.
When processing a number:
- if the number already exists in
seen, then the current occurrence and the earlier occurrence form a duplicate pair, so returningTrueis correct - otherwise the number has not appeared before, so adding it to
seenpreserves the invariant
If the algorithm finishes scanning the array without finding a repeated value, then every element appeared exactly once, so returning False is correct.
Therefore the algorithm correctly determines whether the array contains duplicates.
Complexity
| Metric | Value | Why |
|---|---|---|
| Time | O(n) average |
Each set lookup and insertion is average O(1) |
| Space | O(n) |
The set may store all elements |
Hash Set Implementation
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
Code Explanation
Create the set:
seen = set()
Scan the array:
for num in nums:
Check whether the value already appeared:
if num in seen:
return True
Otherwise store it:
seen.add(num)
If no duplicate is found:
return False
Shorter Python Version
Python sets automatically remove duplicates.
So:
len(set(nums))
gives the number of distinct elements.
If duplicates exist, the set becomes smaller than the original array.
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
return len(nums) != len(set(nums))
This is concise and commonly used in interviews.
Sorting Solution
Another approach is sorting.
After sorting, duplicates become adjacent.
Example:
nums = [3, 1, 2, 1]
Sorted:
[1, 1, 2, 3]
Now duplicates are easy to detect.
class Solution:
def containsDuplicate(self, nums: list[int]) -> bool:
nums.sort()
for i in range(1, len(nums)):
if nums[i] == nums[i - 1]:
return True
return False
Sorting Complexity
| Metric | Value |
|---|---|
| Time | O(n log n) |
| Extra space | Depends on sorting implementation |
The hash set solution is usually preferred because it is linear time.
Testing
def run_tests():
s = Solution()
assert s.containsDuplicate([1, 2, 3, 1]) is True
assert s.containsDuplicate([1, 2, 3, 4]) is False
assert s.containsDuplicate([1, 1, 1, 3, 3, 4, 3, 2, 4, 2]) is True
assert s.containsDuplicate([]) is False
assert s.containsDuplicate([5]) is False
assert s.containsDuplicate([-1, -2, -3, -1]) is True
print("all tests passed")
run_tests()
| Test | Why |
|---|---|
[1,2,3,1] |
Simple duplicate |
[1,2,3,4] |
All unique |
| Multiple repeated values | Many duplicates |
[] |
Empty array |
[5] |
Single element |
| Negative values | Works for all integers |