LeetCode 973 - K Closest Points to Origin

This problem asks us to find the k points that are closest to the origin (0,0) from a list of 2D points. Each point is given as a pair [x, y], and the distance is defined using the Euclidean formula, sqrt(x^2 + y^2).

LeetCode Problem 973

Difficulty: 🟡 Medium
Topics: Array, Math, Divide and Conquer, Geometry, Sorting, Heap (Priority Queue), Quickselect

Solution

Problem Understanding

This problem asks us to find the k points that are closest to the origin (0,0) from a list of 2D points. Each point is given as a pair [x, y], and the distance is defined using the Euclidean formula, sqrt(x^2 + y^2). The input consists of a list of points and an integer k, and the output should be a list of exactly k points that have the smallest distances from the origin. The order of points in the output does not matter.

The constraints tell us that the number of points can be up to 10,000, and the coordinates themselves can range from -10,000 to 10,000. This means that any algorithm with time complexity worse than O(n log n) could be inefficient for large inputs. We also know that the problem guarantees uniqueness except for order, so we do not have to worry about duplicate distance conflicts for correctness. Important edge cases include k = 1 or k = n, negative coordinates, and points lying exactly on the axes.

Approaches

A brute-force approach is to compute the Euclidean distance for each point, store these distances along with the point coordinates, sort all points by distance, and return the first k points. This approach works because sorting guarantees that the closest points appear at the beginning. However, sorting takes O(n log n) time, which could be slower than necessary for large n when k is much smaller than n.

A more optimal approach uses a max-heap (priority queue) to maintain the k closest points seen so far. As we iterate through the points, we push the negative distance into the heap (since Python heaps are min-heaps by default) and pop the farthest point if the heap exceeds size k. This approach is efficient because the heap operations only take O(log k) time per insertion, leading to a total time complexity of O(n log k), which is much faster when k << n. Another optimal approach uses Quickselect, which partitions points based on distance and finds the k-th smallest distance in expected O(n) time, but the heap method is simpler to implement and highly practical.

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) O(n) Compute distances, sort all points, take first k
Max-Heap O(n log k) O(k) Maintain a heap of size k to track closest points
Quickselect O(n) average O(1) extra Partition points around k-th distance, more complex

Algorithm Walkthrough

  1. Initialize an empty max-heap. This heap will store at most k points based on their negative squared distance from the origin. Using the squared distance avoids unnecessary square root calculations.
  2. Iterate through each point [x, y] in the input list. For each point, compute the squared distance d = x*x + y*y.
  3. Push a tuple (d, point) into the heap. The heap property ensures the point with the largest distance among the k closest seen so far is always at the top.
  4. If the size of the heap exceeds k, pop the top element. This ensures the heap only retains the k closest points.
  5. After processing all points, the heap contains exactly the k points closest to the origin. Extract these points and return them.

Why it works: The max-heap ensures that at any moment, we are keeping the closest k points encountered so far. Any point farther than the current farthest in the heap is automatically discarded. By the end of iteration, the heap contains the k points with the smallest distances. The squared distance preserves ordering, so the result is correct without computing square roots.

Python Solution

from typing import List
import heapq

class Solution:
    def kClosest(self, points: List[List[int]], k: int) -> List[List[int]]:
        max_heap = []
        
        for x, y in points:
            dist = -(x * x + y * y)  # Use negative for max-heap
            heapq.heappush(max_heap, (dist, [x, y]))
            if len(max_heap) > k:
                heapq.heappop(max_heap)
        
        return [point for (_, point) in max_heap]

In this Python solution, we iterate over each point, compute the squared distance, and maintain a max-heap of size k. By pushing negative distances, Python's min-heap behaves like a max-heap. After the loop, we extract points from the heap, which guarantees they are the k closest points.

Go Solution

package main

import (
    "container/heap"
    "math"
)

type PointHeap [][2]int

func (h PointHeap) Len() int { return len(h) }
func (h PointHeap) Less(i, j int) bool {
    return dist(h[i]) > dist(h[j]) // max-heap by distance
}
func (h PointHeap) Swap(i, j int) { h[i], h[j] = h[j], h[i] }

func (h *PointHeap) Push(x any) {
    *h = append(*h, x.([2]int))
}
func (h *PointHeap) Pop() any {
    n := len(*h)
    x := (*h)[n-1]
    *h = (*h)[:n-1]
    return x
}

func dist(p [2]int) int {
    return p[0]*p[0] + p[1]*p[1]
}

func kClosest(points [][]int, k int) [][]int {
    h := &PointHeap{}
    heap.Init(h)
    
    for _, p := range points {
        pt := [2]int{p[0], p[1]}
        heap.Push(h, pt)
        if h.Len() > k {
            heap.Pop(h)
        }
    }
    
    result := make([][]int, 0, k)
    for h.Len() > 0 {
        p := heap.Pop(h).([2]int)
        result = append(result, []int{p[0], p[1]})
    }
    return result
}

In Go, we define a custom heap type PointHeap and implement the required methods to maintain a max-heap of distances. Negative distances are not needed because we can define Less to invert the comparison. The rest follows the same logic as Python.

Worked Examples

Example 1: points = [[1,3],[-2,2]], k = 1

Step Heap State Notes
1 [( -10, [1,3] )] Push first point
2 [( -8, [-2,2] )] Push second point, heap > k → pop (-10,[1,3])
Final [[-2,2]] Only closest point remains

Example 2: points = [[3,3],[5,-1],[-2,4]], k = 2

Step Heap State Notes
1 [( -18, [3,3] )] Push first point
2 [( -25, [5,-1] ), (-18, [3,3] )] Push second, heap ≤ k
3 [( -20, [-2,4] ), (-18, [3,3] )] Push third, pop (-25,[5,-1])
Final [[-2,4],[3,3]] Heap contains two closest points

Complexity Analysis

Measure Complexity Explanation
Time O(n log k) Each of n points may require heap push/pop of size k
Space O(k) Heap stores at most k points

The heap approach is much more efficient than sorting all points when k is significantly smaller than n.

Test Cases

# Provided examples
assert sorted(Solution().kClosest([[1,3],[-2,2]], 1)) == sorted([[-2,2]])  # Single closest point
assert sorted(Solution().kClosest([[3,3],[5,-1],[-2,4]], 2)) == sorted([[3,3],[-2,4]])  # Two closest points

# Edge cases
assert sorted(Solution().kClosest([[0,0]], 1)) == [[0,0]]  # Only one point
assert sorted(Solution().kClosest([[1,1],[1,1],[1,1]], 2)) == sorted([[1,1],[1,1]])  # Duplicate points
assert sorted(Solution().kClosest([[10000,10000],[-10000,-10000],[0,1]], 1)) == [[0,1]]  # Very large coordinates
assert sorted(Solution().kClosest([[1,0],[0,1],[-1,0],[0,-1]], 4)) == sorted([[1,0],[0,1],[-1,0],[0,-1]])  # k = n
Test Why
[[1,3],[-2,2]], k=1 Simple example, smallest k
[[3,3],[5,-1],[-2,4]], k=2 Multiple closest points, output order irrelevant
[[