LeetCode 645 - Set Mismatch

The problem describes a corrupted set of integers. Originally, the set contained every number from 1 through n exactly once. Because of an error, one number was duplicated, which means one other number disappeared.

LeetCode Problem 645

Difficulty: 🟢 Easy
Topics: Array, Hash Table, Bit Manipulation, Sorting

Solution

Problem Understanding

The problem describes a corrupted set of integers. Originally, the set contained every number from 1 through n exactly once. Because of an error, one number was duplicated, which means one other number disappeared.

The input array nums represents this corrupted version of the set. Since one number appears twice and another is missing, the array still has length n, but it no longer contains a perfect permutation of the numbers from 1 to n.

The task is to identify two values:

  • The number that appears twice
  • The number that is missing

The result should be returned as an array in the form:

[duplicate, missing]

For example, if the array is:

[1,2,2,4]

the number 2 appears twice, and the number 3 is missing, so the answer is:

[2,3]

The constraints are relatively small:

  • 2 <= nums.length <= 10^4
  • 1 <= nums[i] <= 10^4

This tells us that even an O(n^2) solution could technically pass for the upper bound of 10^4, but it would be inefficient and unnecessary. Since the problem has a very regular structure, specifically numbers from 1 to n, we should aim for a linear time solution.

There are several important edge cases to keep in mind. The duplicate could appear at the beginning or end of the range. The missing value could be 1 or n. The array may be very small, such as [1,1]. A naive implementation that only checks frequencies partially may fail to detect the missing value correctly in these situations.

The problem guarantees exactly one duplicated number and exactly one missing number. This guarantee simplifies the logic considerably because we do not need to handle malformed inputs with multiple duplicates or multiple missing values.

Approaches

Brute Force Approach

A straightforward solution is to examine every number from 1 to n and count how many times it appears in the array.

For each number:

  • If it appears twice, it is the duplicate
  • If it appears zero times, it is the missing number

This works because the original set should contain every number exactly once. Any deviation from frequency 1 immediately identifies either the duplicated or missing value.

The main issue is efficiency. Counting occurrences for every number requires scanning the array repeatedly. If the array has length n, and we perform a full scan for each value from 1 to n, the total complexity becomes O(n^2).

Although the constraints are not extremely large, this approach is inefficient compared to what is possible.

Optimal Approach

The key observation is that we only need to know how many times each number appears.

A hash set or frequency array allows us to process the input in one pass. While iterating through nums:

  • If a number has already been seen, it is the duplicate
  • Otherwise, record it as seen

After processing the array, we iterate from 1 to n to find which number was never seen. That number is the missing value.

This works efficiently because lookup operations in a hash set are O(1) on average.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Repeatedly counts occurrences for every value
Optimal O(n) O(n) Uses a hash set to track seen numbers efficiently

Algorithm Walkthrough

  1. Create an empty set called seen.

The set will store every number encountered while traversing the array. A set is ideal because it provides constant time lookup. 2. Initialize a variable called duplicate.

This variable will eventually hold the number that appears twice. 3. Traverse the array nums.

For each number:

  • If the number is already in seen, then this is the duplicated value.
  • Otherwise, insert the number into seen.

This step identifies the repeated number in a single pass. 4. Traverse the integers from 1 through n.

Since the original set should contain every value in this range, we check which number does not appear in seen. 5. The number not found in seen is the missing value.

Store this value in a variable called missing. 6. Return [duplicate, missing].

Why it works

The array originally contained every number exactly once. Since exactly one number was duplicated and exactly one number was removed, every valid number except one appears exactly once.

The set tracks all encountered values. The first repeated insertion identifies the duplicate because sets cannot contain duplicates. After processing the array, the only number absent from the set must be the missing number.

Because the problem guarantees exactly one duplicate and one missing value, this logic always produces the correct answer.

Python Solution

from typing import List

class Solution:
    def findErrorNums(self, nums: List[int]) -> List[int]:
        seen = set()
        duplicate = -1
        
        for number in nums:
            if number in seen:
                duplicate = number
            else:
                seen.add(number)
        
        missing = -1
        
        for value in range(1, len(nums) + 1):
            if value not in seen:
                missing = value
                break
        
        return [duplicate, missing]

The implementation closely follows the algorithm described earlier.

The seen set stores numbers that have already appeared. During the first traversal, every new number is inserted into the set. If a number already exists in the set, it must be the duplicated value.

After identifying the duplicate, the second loop checks every integer from 1 through n. Since one value is missing from the array, exactly one value will not appear in the set. That value becomes missing.

Finally, the method returns both values in the required order.

Go Solution

func findErrorNums(nums []int) []int {
    seen := make(map[int]bool)
    duplicate := -1

    for _, number := range nums {
        if seen[number] {
            duplicate = number
        } else {
            seen[number] = true
        }
    }

    missing := -1

    for value := 1; value <= len(nums); value++ {
        if !seen[value] {
            missing = value
            break
        }
    }

    return []int{duplicate, missing}
}

The Go solution uses a map[int]bool instead of a set because Go does not have a built in set type.

The logic is otherwise identical to the Python version. Numbers are tracked in the map during traversal, and the missing number is identified by scanning from 1 through n.

There are no integer overflow concerns because the values are very small relative to Go's integer range. The function always returns a slice containing exactly two integers.

Worked Examples

Example 1

Input:

nums = [1,2,2,4]

First Pass, Find Duplicate

Current Number Seen Set Before Action Duplicate
1 {} Add 1 -1
2 {1} Add 2 -1
2 {1,2} Already seen 2
4 {1,2} Add 4 2

After this pass:

seen = {1,2,4}
duplicate = 2

Second Pass, Find Missing

Check numbers from 1 to 4:

Value In Seen Set? Result
1 Yes Continue
2 Yes Continue
3 No Missing = 3
4 Yes Continue

Final answer:

[2,3]

Example 2

Input:

nums = [1,1]

First Pass, Find Duplicate

Current Number Seen Set Before Action Duplicate
1 {} Add 1 -1
1 {1} Already seen 1

After this pass:

seen = {1}
duplicate = 1

Second Pass, Find Missing

Check numbers from 1 to 2:

Value In Seen Set? Result
1 Yes Continue
2 No Missing = 2

Final answer:

[1,2]

Complexity Analysis

Measure Complexity Explanation
Time O(n) One pass to detect duplicate, one pass to detect missing
Space O(n) The set stores up to n numbers

The algorithm performs two linear scans. The first scan processes every element in the array once, and the second scan checks every value from 1 through n. Since both operations are linear, the total time complexity is O(n).

The auxiliary space complexity is O(n) because the set may contain every distinct number from the array.

Test Cases

from typing import List

class Solution:
    def findErrorNums(self, nums: List[int]) -> List[int]:
        seen = set()
        duplicate = -1
        
        for number in nums:
            if number in seen:
                duplicate = number
            else:
                seen.add(number)
        
        missing = -1
        
        for value in range(1, len(nums) + 1):
            if value not in seen:
                missing = value
                break
        
        return [duplicate, missing]

solution = Solution()

assert solution.findErrorNums([1,2,2,4]) == [2,3]  # standard example
assert solution.findErrorNums([1,1]) == [1,2]  # smallest valid input
assert solution.findErrorNums([2,2]) == [2,1]  # missing smallest number
assert solution.findErrorNums([1,2,3,4,4]) == [4,5]  # missing largest number
assert solution.findErrorNums([3,2,3,4,6,5]) == [3,1]  # missing first value
assert solution.findErrorNums([1,5,3,2,2,6,7,8,9,10]) == [2,4]  # middle missing value
assert solution.findErrorNums([1,2,3,3]) == [3,4]  # duplicate near end
assert solution.findErrorNums([2,1,4,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]) == [4,3]  # larger input
Test Why
[1,2,2,4] Standard example from the problem
[1,1] Smallest allowed array
[2,2] Missing number is 1
[1,2,3,4,4] Missing number is n
[3,2,3,4,6,5] Missing number occurs at the beginning
[1,5,3,2,2,6,7,8,9,10] Missing value appears in the middle
[1,2,3,3] Duplicate occurs near the end
Large mixed input Verifies correctness on a larger array

Edge Cases

One important edge case is the smallest possible input size, such as [1,1]. In this situation, the duplicate and missing values are immediately adjacent in the valid range. Some implementations incorrectly assume the array has more diversity or fail to initialize variables correctly. This implementation handles the case naturally because the set detects the repeated 1, and the second pass identifies that 2 is missing.

Another important edge case occurs when the missing number is 1, such as [2,2]. Some algorithms incorrectly assume the smallest value must always exist. During the second pass, this implementation explicitly checks every value starting from 1, so it correctly identifies that 1 never appeared.

A third edge case occurs when the missing number is n, such as [1,2,3,4,4]. Bugs often appear when loops stop too early or use incorrect bounds. This implementation iterates through range(1, len(nums) + 1), ensuring the upper boundary is included and allowing the missing largest value to be detected correctly.