LeetCode 287 - Find the Duplicate Number

The problem gives us an integer array nums of length n + 1, where every value is guaranteed to be in the range [1, n]. Since there are n + 1 numbers but only n possible distinct values, at least one number must appear more than once. The task is to return that duplicate value.

LeetCode Problem 287

Difficulty: 🟡 Medium
Topics: Array, Two Pointers, Binary Search, Bit Manipulation

Solution

Problem Understanding

The problem gives us an integer array nums of length n + 1, where every value is guaranteed to be in the range [1, n]. Since there are n + 1 numbers but only n possible distinct values, at least one number must appear more than once. The task is to return that duplicate value.

The important constraints are what make this problem interesting. We are not allowed to modify the input array, which eliminates solutions like sorting in place. We are also required to use only constant extra space, which means we cannot use a hash set or frequency array. Additionally, the follow up asks for a linear runtime solution, so approaches with quadratic complexity are not acceptable for large inputs.

The array is guaranteed to contain exactly one duplicated number, but that number may appear more than twice. For example, [3,3,3,3,3] is valid because the duplicated value is still only 3.

The constraints also imply something deeper about the structure of the problem. Since every number is between 1 and n, each value can be interpreted as a pointer to another index in the array. This observation allows the problem to be transformed into a cycle detection problem.

Several edge cases are important to keep in mind:

  • The duplicate may appear many times, not just twice.
  • The duplicate may be the smallest value, such as 1.
  • The duplicate may be the largest value, such as n.
  • The array may effectively form a very small cycle immediately.
  • The duplicate may appear far apart in the array.

The problem guarantees that exactly one duplicated value exists, so we never need to handle invalid inputs or arrays without duplicates.

Approaches

Brute Force Approach

A straightforward solution is to compare every element with every other element. For each index i, we scan the rest of the array and check whether nums[i] == nums[j] for some j > i. As soon as we find a match, we return the duplicate value.

This works because eventually the duplicated number will be compared against another occurrence of itself. However, this approach requires checking many pairs of elements, leading to quadratic time complexity.

For an array of size up to 10^5, an O(n^2) solution is too slow. In the worst case, it may require billions of comparisons.

Another common brute force improvement uses a hash set to track seen values. That reduces the runtime to O(n) but violates the constant space requirement because the hash set can grow to size n.

Key Insight for the Optimal Solution

The optimal solution uses Floyd's Cycle Detection Algorithm, also called the tortoise and hare algorithm.

The key insight is that the array can be interpreted as a linked structure:

  • Each index represents a node.
  • The value at that index points to the next node.

For example:

nums = [1,3,4,2,2]

index: 0 1 2 3 4
value: 1 3 4 2 2

This creates transitions:

0 -> 1
1 -> 3
3 -> 2
2 -> 4
4 -> 2

Notice that 2 -> 4 -> 2 forms a cycle.

Why does a cycle exist? Because there are more nodes than unique possible destinations. Since values are restricted to [1, n], some value must be pointed to multiple times. That repeated value becomes the entrance to a cycle.

Finding the duplicate number therefore becomes equivalent to finding the entrance of a cycle in a linked list.

Floyd's algorithm detects a cycle in linear time using only two pointers and constant extra space.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(1) Compare every pair of elements
Optimal O(n) O(1) Uses Floyd's cycle detection algorithm

Algorithm Walkthrough

Floyd's Cycle Detection Algorithm

  1. Initialize two pointers, slow and fast, both starting at nums[0].

The slow pointer moves one step at a time, while the fast pointer moves two steps at a time. Since the array structure contains a cycle, these two pointers are guaranteed to meet eventually. 2. Move the pointers until they meet.

At each iteration:

  • slow = nums[slow]
  • fast = nums[nums[fast]]

The slow pointer advances by one transition, while the fast pointer advances by two transitions. If a cycle exists, the fast pointer will eventually lap the slow pointer and they will meet inside the cycle. 3. Reset one pointer to the beginning.

After the first meeting point, set a new pointer finder to nums[0] while keeping slow at the meeting position. 4. Move both pointers one step at a time.

At each iteration:

  • finder = nums[finder]
  • slow = nums[slow]

Eventually, they meet at the entrance of the cycle. 5. Return the meeting value.

The entrance of the cycle corresponds exactly to the duplicated number.

Why it works

The duplicated number creates multiple paths leading into the same node, which guarantees a cycle in the implicit graph formed by the array. Floyd's algorithm mathematically guarantees that after the first collision inside the cycle, moving one pointer from the start and another from the collision point at equal speed will cause them to meet at the cycle entrance. Since the cycle entrance is the duplicated value, the algorithm correctly returns the duplicate.

Python Solution

from typing import List

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        # Phase 1: Detect intersection point inside the cycle
        slow = nums[0]
        fast = nums[0]

        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]

            if slow == fast:
                break

        # Phase 2: Find entrance to the cycle
        finder = nums[0]

        while finder != slow:
            finder = nums[finder]
            slow = nums[slow]

        return finder

The implementation directly follows Floyd's cycle detection algorithm.

The first section initializes two pointers, slow and fast, both starting from the first value in the array. The loop advances the slow pointer by one step and the fast pointer by two steps. Because a cycle must exist, the two pointers eventually meet somewhere inside the cycle.

After detecting the intersection point, the algorithm enters the second phase. A new pointer called finder starts again from the beginning of the structure. Both finder and slow now move one step at a time. The position where they meet is the cycle entrance, which corresponds to the duplicate number.

The solution satisfies all constraints:

  • The array is never modified.
  • Only a few variables are used, so extra space is constant.
  • Every pointer movement is linear overall.

Go Solution

func findDuplicate(nums []int) int {
    // Phase 1: Detect intersection point
    slow := nums[0]
    fast := nums[0]

    for {
        slow = nums[slow]
        fast = nums[nums[fast]]

        if slow == fast {
            break
        }
    }

    // Phase 2: Find cycle entrance
    finder := nums[0]

    for finder != slow {
        finder = nums[finder]
        slow = nums[slow]
    }

    return finder
}

The Go implementation is nearly identical to the Python version because the algorithm only relies on index traversal.

Go slices already provide indexed access similar to Python lists, so the translation is straightforward. Integer overflow is not a concern because all values are constrained to [1, n] and n <= 10^5.

No special handling for nil or empty slices is needed because the problem guarantees a valid input array with at least two elements.

Worked Examples

Example 1

nums = [1,3,4,2,2]

The implicit structure is:

0 -> 1 -> 3 -> 2 -> 4
               ^    |
               |____|

Phase 1, Detect Cycle

Iteration slow fast
Start 1 1
1 nums[1] = 3 nums[nums[1]] = nums[3] = 2
2 nums[3] = 2 nums[nums[2]] = nums[4] = 2

The pointers meet at 2.

Phase 2, Find Entrance

Iteration finder slow
Start 1 2
1 nums[1] = 3 nums[2] = 4
2 nums[3] = 2 nums[4] = 2

They meet at 2.

Answer:

2

Example 2

nums = [3,1,3,4,2]

Implicit structure:

0 -> 3 -> 4 -> 2
      ^         |
      |_________|

Phase 1, Detect Cycle

Iteration slow fast
Start 3 3
1 4 2
2 2 4
3 3 3

The pointers meet at 3.

Phase 2, Find Entrance

Iteration finder slow
Start 3 3

They already meet immediately.

Answer:

3

Example 3

nums = [3,3,3,3,3]

Every transition points back to 3.

Phase 1, Detect Cycle

Iteration slow fast
Start 3 3
1 3 3

The pointers meet immediately.

Phase 2, Find Entrance

Iteration finder slow
Start 3 3

Answer:

3

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each pointer traverses at most a linear number of positions
Space O(1) Only a few pointer variables are used

The runtime is linear because Floyd's algorithm performs at most a few passes through the array. Even though there are nested pointer movements conceptually, each pointer advances through the structure only a limited number of times proportional to n.

The space complexity is constant because the algorithm stores only a fixed number of integer variables regardless of input size.

Test Cases

from typing import List

class Solution:
    def findDuplicate(self, nums: List[int]) -> int:
        slow = nums[0]
        fast = nums[0]

        while True:
            slow = nums[slow]
            fast = nums[nums[fast]]

            if slow == fast:
                break

        finder = nums[0]

        while finder != slow:
            finder = nums[finder]
            slow = nums[slow]

        return finder

sol = Solution()

assert sol.findDuplicate([1,3,4,2,2]) == 2  # standard example
assert sol.findDuplicate([3,1,3,4,2]) == 3  # duplicate near beginning
assert sol.findDuplicate([3,3,3,3,3]) == 3  # duplicate repeated many times

assert sol.findDuplicate([1,1]) == 1  # smallest valid input
assert sol.findDuplicate([1,1,2]) == 1  # duplicate is minimum value
assert sol.findDuplicate([2,1,2]) == 2  # duplicate is maximum value

assert sol.findDuplicate([1,4,6,3,2,5,6]) == 6  # duplicate at end
assert sol.findDuplicate([2,5,9,6,9,3,8,9,7,1]) == 9  # duplicate appears multiple times

assert sol.findDuplicate([4,3,1,4,2]) == 4  # cycle entrance not near start
assert sol.findDuplicate([2,2,2,2,2]) == 2  # all elements identical
Test Why
[1,3,4,2,2] Standard example with simple cycle
[3,1,3,4,2] Duplicate appears early
[3,3,3,3,3] Duplicate repeated many times
[1,1] Smallest possible valid input
[1,1,2] Duplicate is minimum value
[2,1,2] Duplicate is maximum value
[1,4,6,3,2,5,6] Duplicate near end
[2,5,9,6,9,3,8,9,7,1] Duplicate appears more than twice
[4,3,1,4,2] Longer traversal before cycle
[2,2,2,2,2] Entire structure collapses into one node

Edge Cases

One important edge case is when the duplicate appears more than twice, such as [3,3,3,3,3]. A naive implementation might assume the duplicate appears exactly twice and fail in this situation. Floyd's algorithm still works because the structure still forms a valid cycle whose entrance is the duplicated number.

Another critical edge case is the smallest valid input, [1,1]. Very small arrays can expose off by one errors or invalid index handling. In this case, both pointers immediately point back to the same value, and the implementation correctly detects the cycle without requiring any special handling.

A third important edge case occurs when the duplicate value is at the boundary of the allowed range, either 1 or n. Some incorrect solutions accidentally rely on zero based assumptions or misuse indices. Since the algorithm always treats values as next pointers within the guaranteed range [1, n], boundary duplicates are handled naturally and safely.

A final subtle edge case is when the cycle begins very close to the start of the traversal. For example, in [2,2,2,2,2], the structure immediately loops on itself. Some implementations incorrectly initialize pointers and miss this immediate cycle. By initializing both pointers to nums[0] and advancing them before comparison, the implementation correctly handles even degenerate cycles.