LeetCode 579 - Find Cumulative Salary of an Employee

This problem asks us to compute a cumulative salary summary for each employee based on their monthly salaries over the year 2020. The input is a table Employee where each row contains an employee id, the month (1 through 12), and the salary for that month.

LeetCode Problem 579

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to compute a cumulative salary summary for each employee based on their monthly salaries over the year 2020. The input is a table Employee where each row contains an employee id, the month (1 through 12), and the salary for that month. Each (id, month) combination is unique.

The cumulative salary summary is calculated as a 3-month rolling sum: for a given month m, the sum of salaries from month m, m-1, and m-2. If an employee did not work in the previous months, those months are treated as having salary 0. Importantly, the summary excludes the most recent month worked for each employee and does not include months in which the employee did not work. The output must be ordered by id ascending, and in case of ties, by month descending.

Edge cases include employees with only one month of data, months with gaps, and months where the employee starts late in the year (so prior months are effectively zero). The input guarantees that months are integers from 1 to 12 and (id, month) is unique, simplifying the indexing.

Approaches

The brute-force approach iterates through each employee and each month, summing the salaries of that month and the previous two months. For each employee, this requires checking up to 12 months and looking back up to 2 months each time, leading to repeated computations. While correct, this approach is inefficient, especially for a large number of employees and months, because it repeatedly scans salary records.

The optimal approach leverages window functions in SQL or cumulative computation in code. The insight is that for each employee, sorting by month allows maintaining a rolling sum over the last three months. We can efficiently calculate the 3-month sum without repeated scans and exclude the most recent month by first identifying it per employee.

Approach Time Complexity Space Complexity Notes
Brute Force O(E * M^2) O(E * M) Sum last 3 months repeatedly for each month for each employee
Optimal O(E * M) O(E * M) Sort by employee and month, compute rolling sum with window/array, exclude last month

Here, E is the number of employees and M is the maximum number of months per employee (up to 12).

Algorithm Walkthrough

  1. Retrieve all salary records from the Employee table and group them by employee id. Within each employee group, sort the records by month ascending.
  2. For each employee, initialize a rolling sum array of size 3 to store salaries of the last three months. Also, identify the most recent month for that employee.
  3. Iterate over each month for the employee. For the current month, calculate the 3-month cumulative sum by adding the current month's salary to the salaries of the previous two months stored in the rolling array. This ensures that missing months are treated as zero.
  4. Skip the calculation for the most recent month, as specified.
  5. Store the resulting (id, month, 3-month sum) in a results list.
  6. Once all employees are processed, sort the final result by id ascending and month descending.
  7. Return the result.

Why it works: Sorting by month ensures that for each employee, the rolling sum captures consecutive months. Using a rolling array of length 3 guarantees efficient computation while automatically treating missing months as zero. Excluding the most recent month after processing ensures correctness as per the problem constraints.

Python Solution

from typing import List
import pandas as pd

def findCumulativeSalary(employee: pd.DataFrame) -> pd.DataFrame:
    # Sort by employee id and month ascending
    employee = employee.sort_values(['id', 'month'])
    
    result = []
    
    for emp_id, group in employee.groupby('id'):
        months = group['month'].tolist()
        salaries = group['salary'].tolist()
        last_month = max(months)
        
        # Rolling sums for previous two months
        rolling = [0, 0]  # previous two months
        for i, month in enumerate(months):
            if month == last_month:
                continue  # skip the most recent month
            # Calculate 3-month sum
            sum_last_3 = salaries[i] + rolling[-1] + rolling[-2] if len(rolling) >= 2 else salaries[i]
            result.append((emp_id, month, sum_last_3))
            # Update rolling
            rolling.append(salaries[i])
            if len(rolling) > 2:
                rolling.pop(0)
    
    # Convert to DataFrame and sort
    result_df = pd.DataFrame(result, columns=['id', 'month', 'Salary'])
    result_df = result_df.sort_values(['id', 'month'], ascending=[True, False])
    return result_df

In this Python solution, we use pandas to group and sort data efficiently. The rolling sum array stores salaries of the last two months, and the current month salary is added to compute the 3-month sum. We skip the most recent month and finally sort the DataFrame as required.

Go Solution

package main

import (
    "sort"
)

type Employee struct {
    Id     int
    Month  int
    Salary int
}

type Result struct {
    Id     int
    Month  int
    Salary int
}

func FindCumulativeSalary(employees []Employee) []Result {
    // Group by employee id
    empMap := make(map[int][]Employee)
    for _, e := range employees {
        empMap[e.Id] = append(empMap[e.Id], e)
    }

    var results []Result

    for id, records := range empMap {
        // Sort records by month ascending
        sort.Slice(records, func(i, j int) bool { return records[i].Month < records[j].Month })
        lastMonth := records[len(records)-1].Month
        rolling := []int{0, 0}
        
        for _, r := range records {
            if r.Month == lastMonth {
                continue
            }
            sum := r.Salary
            if len(rolling) >= 2 {
                sum += rolling[0] + rolling[1]
            }
            results = append(results, Result{Id: id, Month: r.Month, Salary: sum})
            rolling = append(rolling, r.Salary)
            if len(rolling) > 2 {
                rolling = rolling[1:]
            }
        }
    }

    // Sort results by id ascending, month descending
    sort.Slice(results, func(i, j int) bool {
        if results[i].Id == results[j].Id {
            return results[i].Month > results[j].Month
        }
        return results[i].Id < results[j].Id
    })

    return results
}

In Go, we handle grouping manually using a map and sorting slices. The rolling array logic is identical to Python. Go requires explicit slice management to mimic the rolling window.

Worked Examples

For employee 1 with months [1,2,3,4,7,8] and salaries [20,30,40,60,90,90]:

Month Rolling Last 2 3-Month Sum Notes
1 [0,0] 20 Only current month salary, no previous months
2 [0,20] 50 30+20+0
3 [20,30] 90 40+30+20
4 [30,40] 130 60+40+30
7 [40,60] 90 90+0+0 (months 5,6 missing)
8 skip - Most recent month

The result is ordered by month descending per employee, producing the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(E * M) Sorting per employee and iterating over months, with constant-time rolling sum
Space O(E * M) Storage for grouped employees and rolling arrays

The time complexity is linear in the number of salary records once sorted. Space is proportional to input size because we maintain all records and a small rolling array per employee.

Test Cases

# Provided example
df = pd.DataFrame([
    (1,1,20),(2,1,20),(1,2,30),(2,2,30),(3,2,40),
    (1,3,40),(3,3,60),(1,4,60),(3,4,70),(1,7,90),(1,8,90)
], columns=['id','month','salary'])

result = findCumulativeSalary(df)
assert result.iloc[0].tolist() == [1,7,90]  # Employee 1, month 7
assert result.iloc[-1].tolist() == [3,2,40]  # Employee 3, month 2

# Edge case: single month employee
df2 = pd.DataFrame([(4,5,100)], columns=['id','month','salary'])
res2 = findCumulativeSalary(df2)
assert res2.empty  # Only one month, which is also most recent

# Edge case: missing months in between
df3 = pd.DataFrame([(5,1,10