LeetCode 645 - Set Mismatch
The problem describes a corrupted set of integers. Originally, the set contained every number from 1 through n exactly once. Because of an error, one number was duplicated, which means one other number disappeared.
Difficulty: 🟢 Easy
Topics: Array, Hash Table, Bit Manipulation, Sorting
Solution
Problem Understanding
The problem describes a corrupted set of integers. Originally, the set contained every number from 1 through n exactly once. Because of an error, one number was duplicated, which means one other number disappeared.
The input array nums represents this corrupted version of the set. Since one number appears twice and another is missing, the array still has length n, but it no longer contains a perfect permutation of the numbers from 1 to n.
The task is to identify two values:
- The number that appears twice
- The number that is missing
The result should be returned as an array in the form:
[duplicate, missing]
For example, if the array is:
[1,2,2,4]
the number 2 appears twice, and the number 3 is missing, so the answer is:
[2,3]
The constraints are relatively small:
2 <= nums.length <= 10^41 <= nums[i] <= 10^4
This tells us that even an O(n^2) solution could technically pass for the upper bound of 10^4, but it would be inefficient and unnecessary. Since the problem has a very regular structure, specifically numbers from 1 to n, we should aim for a linear time solution.
There are several important edge cases to keep in mind. The duplicate could appear at the beginning or end of the range. The missing value could be 1 or n. The array may be very small, such as [1,1]. A naive implementation that only checks frequencies partially may fail to detect the missing value correctly in these situations.
The problem guarantees exactly one duplicated number and exactly one missing number. This guarantee simplifies the logic considerably because we do not need to handle malformed inputs with multiple duplicates or multiple missing values.
Approaches
Brute Force Approach
A straightforward solution is to examine every number from 1 to n and count how many times it appears in the array.
For each number:
- If it appears twice, it is the duplicate
- If it appears zero times, it is the missing number
This works because the original set should contain every number exactly once. Any deviation from frequency 1 immediately identifies either the duplicated or missing value.
The main issue is efficiency. Counting occurrences for every number requires scanning the array repeatedly. If the array has length n, and we perform a full scan for each value from 1 to n, the total complexity becomes O(n^2).
Although the constraints are not extremely large, this approach is inefficient compared to what is possible.
Optimal Approach
The key observation is that we only need to know how many times each number appears.
A hash set or frequency array allows us to process the input in one pass. While iterating through nums:
- If a number has already been seen, it is the duplicate
- Otherwise, record it as seen
After processing the array, we iterate from 1 to n to find which number was never seen. That number is the missing value.
This works efficiently because lookup operations in a hash set are O(1) on average.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) | Repeatedly counts occurrences for every value |
| Optimal | O(n) | O(n) | Uses a hash set to track seen numbers efficiently |
Algorithm Walkthrough
- Create an empty set called
seen.
The set will store every number encountered while traversing the array. A set is ideal because it provides constant time lookup.
2. Initialize a variable called duplicate.
This variable will eventually hold the number that appears twice.
3. Traverse the array nums.
For each number:
- If the number is already in
seen, then this is the duplicated value. - Otherwise, insert the number into
seen.
This step identifies the repeated number in a single pass.
4. Traverse the integers from 1 through n.
Since the original set should contain every value in this range, we check which number does not appear in seen.
5. The number not found in seen is the missing value.
Store this value in a variable called missing.
6. Return [duplicate, missing].
Why it works
The array originally contained every number exactly once. Since exactly one number was duplicated and exactly one number was removed, every valid number except one appears exactly once.
The set tracks all encountered values. The first repeated insertion identifies the duplicate because sets cannot contain duplicates. After processing the array, the only number absent from the set must be the missing number.
Because the problem guarantees exactly one duplicate and one missing value, this logic always produces the correct answer.
Python Solution
from typing import List
class Solution:
def findErrorNums(self, nums: List[int]) -> List[int]:
seen = set()
duplicate = -1
for number in nums:
if number in seen:
duplicate = number
else:
seen.add(number)
missing = -1
for value in range(1, len(nums) + 1):
if value not in seen:
missing = value
break
return [duplicate, missing]
The implementation closely follows the algorithm described earlier.
The seen set stores numbers that have already appeared. During the first traversal, every new number is inserted into the set. If a number already exists in the set, it must be the duplicated value.
After identifying the duplicate, the second loop checks every integer from 1 through n. Since one value is missing from the array, exactly one value will not appear in the set. That value becomes missing.
Finally, the method returns both values in the required order.
Go Solution
func findErrorNums(nums []int) []int {
seen := make(map[int]bool)
duplicate := -1
for _, number := range nums {
if seen[number] {
duplicate = number
} else {
seen[number] = true
}
}
missing := -1
for value := 1; value <= len(nums); value++ {
if !seen[value] {
missing = value
break
}
}
return []int{duplicate, missing}
}
The Go solution uses a map[int]bool instead of a set because Go does not have a built in set type.
The logic is otherwise identical to the Python version. Numbers are tracked in the map during traversal, and the missing number is identified by scanning from 1 through n.
There are no integer overflow concerns because the values are very small relative to Go's integer range. The function always returns a slice containing exactly two integers.
Worked Examples
Example 1
Input:
nums = [1,2,2,4]
First Pass, Find Duplicate
| Current Number | Seen Set Before | Action | Duplicate |
|---|---|---|---|
| 1 | {} | Add 1 | -1 |
| 2 | {1} | Add 2 | -1 |
| 2 | {1,2} | Already seen | 2 |
| 4 | {1,2} | Add 4 | 2 |
After this pass:
seen = {1,2,4}
duplicate = 2
Second Pass, Find Missing
Check numbers from 1 to 4:
| Value | In Seen Set? | Result |
|---|---|---|
| 1 | Yes | Continue |
| 2 | Yes | Continue |
| 3 | No | Missing = 3 |
| 4 | Yes | Continue |
Final answer:
[2,3]
Example 2
Input:
nums = [1,1]
First Pass, Find Duplicate
| Current Number | Seen Set Before | Action | Duplicate |
|---|---|---|---|
| 1 | {} | Add 1 | -1 |
| 1 | {1} | Already seen | 1 |
After this pass:
seen = {1}
duplicate = 1
Second Pass, Find Missing
Check numbers from 1 to 2:
| Value | In Seen Set? | Result |
|---|---|---|
| 1 | Yes | Continue |
| 2 | No | Missing = 2 |
Final answer:
[1,2]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | One pass to detect duplicate, one pass to detect missing |
| Space | O(n) | The set stores up to n numbers |
The algorithm performs two linear scans. The first scan processes every element in the array once, and the second scan checks every value from 1 through n. Since both operations are linear, the total time complexity is O(n).
The auxiliary space complexity is O(n) because the set may contain every distinct number from the array.
Test Cases
from typing import List
class Solution:
def findErrorNums(self, nums: List[int]) -> List[int]:
seen = set()
duplicate = -1
for number in nums:
if number in seen:
duplicate = number
else:
seen.add(number)
missing = -1
for value in range(1, len(nums) + 1):
if value not in seen:
missing = value
break
return [duplicate, missing]
solution = Solution()
assert solution.findErrorNums([1,2,2,4]) == [2,3] # standard example
assert solution.findErrorNums([1,1]) == [1,2] # smallest valid input
assert solution.findErrorNums([2,2]) == [2,1] # missing smallest number
assert solution.findErrorNums([1,2,3,4,4]) == [4,5] # missing largest number
assert solution.findErrorNums([3,2,3,4,6,5]) == [3,1] # missing first value
assert solution.findErrorNums([1,5,3,2,2,6,7,8,9,10]) == [2,4] # middle missing value
assert solution.findErrorNums([1,2,3,3]) == [3,4] # duplicate near end
assert solution.findErrorNums([2,1,4,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]) == [4,3] # larger input
| Test | Why |
|---|---|
[1,2,2,4] |
Standard example from the problem |
[1,1] |
Smallest allowed array |
[2,2] |
Missing number is 1 |
[1,2,3,4,4] |
Missing number is n |
[3,2,3,4,6,5] |
Missing number occurs at the beginning |
[1,5,3,2,2,6,7,8,9,10] |
Missing value appears in the middle |
[1,2,3,3] |
Duplicate occurs near the end |
| Large mixed input | Verifies correctness on a larger array |
Edge Cases
One important edge case is the smallest possible input size, such as [1,1]. In this situation, the duplicate and missing values are immediately adjacent in the valid range. Some implementations incorrectly assume the array has more diversity or fail to initialize variables correctly. This implementation handles the case naturally because the set detects the repeated 1, and the second pass identifies that 2 is missing.
Another important edge case occurs when the missing number is 1, such as [2,2]. Some algorithms incorrectly assume the smallest value must always exist. During the second pass, this implementation explicitly checks every value starting from 1, so it correctly identifies that 1 never appeared.
A third edge case occurs when the missing number is n, such as [1,2,3,4,4]. Bugs often appear when loops stop too early or use incorrect bounds. This implementation iterates through range(1, len(nums) + 1), ensuring the upper boundary is included and allowing the missing largest value to be detected correctly.