LeetCode 1421 - NPV Queries

The problem presents two database tables: NPV and Queries. The NPV table contains historical net present value (NPV) dat

LeetCode Problem 1421

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem presents two database tables: NPV and Queries. The NPV table contains historical net present value (NPV) data for different inventory items identified by an id and a year. The Queries table contains requests for the NPV of specific inventory items at specific years. Each row in Queries represents a single query.

The task is to return a result table where each query from Queries is mapped to its corresponding npv value from the NPV table. If a queried (id, year) pair does not exist in the NPV table, the npv should default to 0. There is no restriction on the order of the output rows.

Key points include the primary key constraints: (id, year) is unique in both tables. This ensures that for any (id, year) pair, there is at most one matching NPV record. Edge cases involve queries for (id, year) pairs missing in the NPV table, which should return zero.

Approaches

A naive or brute-force approach would involve iterating over every query in the Queries table and performing a search through the entire NPV table to find a matching (id, year) pair. While this guarantees correctness, it is inefficient because the time complexity is O(Q * N), where Q is the number of queries and N is the number of rows in the NPV table. For large tables, this becomes computationally expensive.

The optimal approach leverages a hash-based join. By creating a dictionary (hash map) from the NPV table where the key is a tuple (id, year) and the value is the npv, we can answer each query in constant time O(1). This reduces the overall time complexity to O(N + Q), which is far more efficient.

Approach Time Complexity Space Complexity Notes
Brute Force O(Q * N) O(1) Iterate over every query and search NPV for a match
Optimal O(N + Q) O(N) Use a hash map to store NPV values and lookup queries efficiently

Algorithm Walkthrough

  1. Initialize an empty dictionary npv_map that will map (id, year) tuples to their corresponding npv values from the NPV table. This allows constant-time lookups.
  2. Iterate over each row in the NPV table. For each row, insert an entry into npv_map with the key (row.id, row.year) and the value row.npv.
  3. Initialize an empty list result to store the final output.
  4. Iterate over each row in the Queries table. For each query (id, year), attempt to fetch the corresponding npv from npv_map. If the key does not exist, default to 0.
  5. Append a dictionary representing the query and its NPV to the result list.
  6. Return the result list.

Why it works: The dictionary guarantees that every (id, year) pair from NPV is mapped correctly. Lookup for each query is O(1). If a query is missing in NPV, using a default of 0 ensures correctness. Since every query is handled exactly once, the result is correct for all queries.

Python Solution

import pandas as pd

def npv_queries(NPV: pd.DataFrame, Queries: pd.DataFrame) -> pd.DataFrame:
    # Step 1: Build the mapping from (id, year) -> npv
    npv_map = {(row.id, row.year): row.npv for row in NPV.itertuples(index=False)}
    
    # Step 2: Prepare the result list
    result = []
    for row in Queries.itertuples(index=False):
        npv_value = npv_map.get((row.id, row.year), 0)
        result.append({'id': row.id, 'year': row.year, 'npv': npv_value})
    
    # Step 3: Convert to DataFrame
    return pd.DataFrame(result)

Explanation: The dictionary comprehension constructs the hash map for fast lookups. Iterating through Queries allows us to fetch each NPV efficiently, defaulting to zero for missing entries. Finally, the result list is converted back to a DataFrame for submission.

Go Solution

package main

type NPVRecord struct {
    Id   int
    Year int
    Npv  int
}

type Query struct {
    Id   int
    Year int
}

type Result struct {
    Id   int
    Year int
    Npv  int
}

func NPVQueries(npvTable []NPVRecord, queries []Query) []Result {
    npvMap := make(map[[2]int]int)
    
    // Build the map
    for _, row := range npvTable {
        key := [2]int{row.Id, row.Year}
        npvMap[key] = row.Npv
    }
    
    // Prepare the result
    results := make([]Result, 0, len(queries))
    for _, q := range queries {
        key := [2]int{q.Id, q.Year}
        npvValue, exists := npvMap[key]
        if !exists {
            npvValue = 0
        }
        results = append(results, Result{Id: q.Id, Year: q.Year, Npv: npvValue})
    }
    
    return results
}

Explanation: Go uses a map with [2]int as the key to store (id, year) combinations. Lookup for each query uses exists to handle missing keys, defaulting to zero. The result is appended to a slice of Result.

Worked Examples

For the example in the problem statement:

  1. Build npv_map from NPV:
Key Value
(1,2018) 100
(7,2020) 30
(13,2019) 40
(1,2019) 113
(2,2008) 121
(3,2009) 12
(11,2020) 99
(7,2019) 0
  1. Iterate through Queries:
Query npv lookup Result
(1,2019) 113 113
(2,2008) 121 121
(3,2009) 12 12
(7,2018) missing 0
(7,2019) 0 0
(7,2020) 30 30
(13,2019) 40 40

Final output matches the problem example.

Complexity Analysis

Measure Complexity Explanation
Time O(N + Q) Building the map is O(N), and querying all Q entries is O(Q)
Space O(N) The map stores all NPV rows

The complexity is efficient because lookups are constant-time and we iterate through each table only once.

Test Cases

import pandas as pd

# Example test case
NPV = pd.DataFrame({
    'id': [1,7,13,1,2,3,11,7],
    'year': [2018,2020,2019,2019,2008,2009,2020,2019],
    'npv': [100,30,40,113,121,12,99,0]
})
Queries = pd.DataFrame({
    'id': [1,2,3,7,7,7,13],
    'year': [2019,2008,2009,2018,2019,2020,2019]
})
result = npv_queries(NPV, Queries)
assert result[result['id']==1]['npv'].iloc[0] == 113
assert result[result['id']==7][result['year']==2018]['npv'].iloc[0] == 0  # missing npv

# Edge case: empty NPV table
NPV_empty = pd.DataFrame(columns=['id','year','npv'])
Queries_small = pd.DataFrame({'id':[1],'year':[2000]})
assert npv_queries(NPV_empty, Queries_small)['npv'].iloc[0] == 0

# Edge case: empty Queries table
NPV_full = pd.DataFrame({'id':[1],'year':[2000],'npv':[50]})
Queries_empty = pd.DataFrame(columns=['id','year'])
assert npv_queries(NPV_full, Queries_empty).empty
Test Why
Provided example Validates normal mapping and default zero handling
Empty NPV table Ensures default zero for all queries
Empty Queries table Confirms handling of no queries gracefully

Edge Cases

1. Query not in NPV: When a query references an (id, year) not present in the NPV table, a naive solution may fail or return null. By using dict.get(key, 0), the implementation correctly returns zero.

2. Empty tables: If either `NP