LeetCode 585 - Investments in 2016

Difficulty: 🟡 Medium
Topics: Database

Solution

Here’s a full, detailed technical solution guide for LeetCode 585 following your exact formatting rules:

Problem Understanding

The problem presents an Insurance table containing policyholder information, including a unique policy ID (pid), investment values for 2015 (tiv_2015) and 2016 (tiv_2016), and location coordinates (lat and lon). The task is to calculate the sum of tiv_2016 for policies that meet two specific conditions:

The tiv_2015 value appears for more than one policyholder, meaning the investment value in 2015 is shared with at least one other policyholder.
The location, defined by the (lat, lon) pair, is unique, so no other policyholder has the same coordinates.

The expected output is a single floating-point value rounded to two decimal places.

The input guarantees that every lat and lon is not null and that pid is unique. However, there could be multiple rows with the same tiv_2015 or the same (lat, lon). The challenge lies in combining both conditions simultaneously.

Important edge cases include: a tiv_2015 value that is repeated but the corresponding locations are not unique, all policies having unique tiv_2015, or all locations being duplicates.

Approaches

Brute Force

A naive approach would iterate over all rows, checking for each row if there exists another row with the same tiv_2015 and if the location is unique. This requires nested scans: for each row, a full pass over the table to check tiv_2015 duplicates, and another pass to check (lat, lon) uniqueness. While correct, this approach scales poorly with n rows, giving roughly O(n²) time complexity.

Optimal Approach

The key observation is that both conditions can be precomputed efficiently using grouping and counting. First, count the occurrences of each tiv_2015 value and each (lat, lon) pair. Rows that have a tiv_2015 count greater than 1 satisfy the first condition, and rows with (lat, lon) count equal to 1 satisfy the second. Joining these filters allows a single scan to compute the sum of tiv_2016.

This approach leverages SQL aggregate functions and subqueries or, in programming languages, hash maps/dictionaries for counting. This reduces the time complexity to linear in the number of rows.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(1)	Nested loops to check conditions for each row
Optimal	O(n)	O(n)	Use hash maps to count duplicates and uniques, then sum qualifying `tiv_2016`

Algorithm Walkthrough

Traverse the table once to count occurrences of tiv_2015 for each value. Store these counts in a hash map where the key is tiv_2015 and the value is the number of times it occurs.
Traverse the table once to count occurrences of (lat, lon) pairs. Store these counts in a hash map where the key is the tuple (lat, lon) and the value is the frequency of that location.
Initialize a running sum for tiv_2016.
For each row, check if the tiv_2015 count is greater than 1 and the (lat, lon) count is exactly 1. If both conditions are met, add the tiv_2016 value to the sum.
After traversing all rows, round the sum to two decimal places and return it.

Why it works: The hash maps guarantee we can efficiently check the two conditions for each row in constant time. By counting first, we avoid repeated scanning of the dataset, ensuring correctness and efficiency.

Python Solution

from typing import List
from collections import Counter

class Solution:
    def tiv2016Sum(self, insurance: List[dict]) -> float:
        # Count occurrences of tiv_2015
        tiv_2015_counts = Counter(row['tiv_2015'] for row in insurance)
        
        # Count occurrences of location (lat, lon)
        location_counts = Counter((row['lat'], row['lon']) for row in insurance)
        
        # Compute sum for rows meeting both conditions
        total = 0.0
        for row in insurance:
            if tiv_2015_counts[row['tiv_2015']] > 1 and location_counts[(row['lat'], row['lon'])] == 1:
                total += row['tiv_2016']
        
        return round(total, 2)

The Python code first uses Counter to compute the frequency of tiv_2015 values and (lat, lon) pairs. Then it iterates over all rows, checking both conditions efficiently and summing qualifying tiv_2016 values. The final result is rounded to two decimal places.

Go Solution

package main

import (
    "fmt"
)

type Insurance struct {
    Pid     int
    Tiv2015 float64
    Tiv2016 float64
    Lat     float64
    Lon     float64
}

func tiv2016Sum(insurance []Insurance) float64 {
    tivCounts := make(map[float64]int)
    locCounts := make(map[string]int)
    
    for _, row := range insurance {
        tivCounts[row.Tiv2015]++
        key := fmt.Sprintf("%f,%f", row.Lat, row.Lon)
        locCounts[key]++
    }
    
    total := 0.0
    for _, row := range insurance {
        key := fmt.Sprintf("%f,%f", row.Lat, row.Lon)
        if tivCounts[row.Tiv2015] > 1 && locCounts[key] == 1 {
            total += row.Tiv2016
        }
    }
    
    return float64(int(total*100+0.5)) / 100.0
}

The Go version uses maps to count tiv_2015 occurrences and location frequencies. String keys represent (lat, lon) pairs. The sum is rounded to two decimal places using arithmetic rounding. Go requires explicit formatting for tuples since it lacks native tuple keys in maps.

Worked Examples

Input:

pid	tiv_2015	tiv_2016	lat	lon
1	10	5	10	10
2	20	20	20	20
3	10	30	20	20
4	10	40	40	40

Step 1: Count tiv_2015:

tiv_2015	count
10	3
20	1

Step 2: Count (lat, lon):

(lat, lon)	count
(10, 10)	1
(20, 20)	2
(40, 40)	1

Step 3: Sum tiv_2016 for qualifying rows:

pid 1: tiv_2015 count 3 > 1, location count 1 → include 5
pid 2: tiv_2015 count 1 → exclude
pid 3: tiv_2015 count 3, location count 2 → exclude
pid 4: tiv_2015 count 3, location count 1 → include 40

Total = 5 + 40 = 45.00

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Two passes for counting and one for summing, each linear in the number of rows
Space	O(n)	Hash maps store counts for all `tiv_2015` values and locations

The approach is linear in time and space, making it scalable for large tables, assuming hash map operations are O(1) on average.

Test Cases

# test cases
insurance1 = [
    {"pid": 1, "tiv_2015": 10, "tiv_2016": 5, "lat": 10, "lon": 10},
    {"pid": 2, "tiv_2015": 20, "tiv_2016": 20, "lat": 20, "lon": 20},
    {"pid": 3, "tiv_2015": 10, "tiv_2016": 30, "lat": 20, "lon": 20},
    {"pid": 4, "tiv_2015": 10, "tiv_2016": 40, "lat": 40, "lon": 40},
]
sol = Solution()
assert sol.tiv2016Sum(insurance1) == 45.00  # example case

insurance2 = [
    {"pid": 1, "tiv_2015": 5, "tiv_2016": 10, "lat": 0, "lon": 0},
    {"pid": 2,