LeetCode 615 - Average Salary: Departments VS Company

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This problem asks us to compare the average salary of each department against the company's overall average salary for each month. We are given two tables: Salary and Employee. The Salary table records the monthly salary of each employee along with a date, while the Employee table maps employees to their department.

The goal is to produce a table where each row shows a month (pay_month), a department (department_id), and a comparison value indicating whether the department’s average salary in that month is "higher", "lower", or the "same" as the company's average salary in that same month.

Important points include:

pay_date is a full date, but we only care about the month part for aggregation.
The average should be computed over all salaries in that month.
Edge cases include months where a department has only one employee, months where all salaries are equal, or months where some employees may have multiple salary entries (e.g., bonuses or multiple pay entries per month).
The output can be in any order.

The problem guarantees that every employee belongs to exactly one department and that the tables are well-formed.

Edge cases to consider upfront:

A department has only one employee for a month.
All employees have the same salary in a month.
Salaries with multiple entries in the same month.
A department with no employees in a given month should not appear in the result.

Approaches

Brute Force

The brute-force approach would be to iterate through each month, calculate the company’s average salary for that month, then for each department calculate its average salary for that month, and finally compare them. While correct, this requires multiple nested aggregations and can become expensive because it effectively scans the entire Salary table multiple times, which is inefficient for large datasets.

Optimal

The optimal approach leverages SQL’s ability to perform joins and aggregations. First, we join Salary with Employee to associate each salary with its department. Then, we extract the month from pay_date and compute the average salary grouped by month and department. Similarly, we compute the overall company average grouped by month. Finally, we join these two aggregates on the month and compare the averages using a CASE statement to produce "higher", "lower", or "same". This approach requires a single pass through the data for aggregation and a single join for comparison, making it much more efficient.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n * d)	O(n)	Nested iteration for each month and department; inefficient for large tables
Optimal	O(n)	O(n)	Single aggregation of salaries with join; much faster and leverages SQL optimizations

Algorithm Walkthrough

Join the Salary table with Employee on employee_id to associate salaries with their department.
Extract pay_month from pay_date using SQL date functions.
Compute the monthly department average salary by grouping by pay_month and department_id.
Compute the monthly company average salary by grouping only by pay_month.
Join the department average with the company average on pay_month.
Use a CASE expression to compare dept_avg and company_avg and output "higher", "lower", or "same".
Return pay_month, department_id, and the comparison.

Why it works: By aggregating first and then joining, we ensure that each department’s monthly average is accurately compared against the corresponding company monthly average. The CASE ensures the correct textual comparison.

Python Solution

# This problem is SQL-specific, but for reference, a Python pandas implementation could look like this
import pandas as pd

def department_vs_company_comparison(salary: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:
    salary['pay_month'] = pd.to_datetime(salary['pay_date']).dt.to_period('M').astype(str)
    merged = salary.merge(employee, on='employee_id')
    
    dept_avg = merged.groupby(['pay_month', 'department_id'])['amount'].mean().reset_index(name='dept_avg')
    company_avg = merged.groupby('pay_month')['amount'].mean().reset_index(name='company_avg')
    
    result = dept_avg.merge(company_avg, on='pay_month')
    
    def compare(row):
        if row['dept_avg'] > row['company_avg']:
            return 'higher'
        elif row['dept_avg'] < row['company_avg']:
            return 'lower'
        else:
            return 'same'
    
    result['comparison'] = result.apply(compare, axis=1)
    return result[['pay_month', 'department_id', 'comparison']]

Explanation: The code converts pay_date to a YYYY-MM string, joins Salary with Employee, calculates department and company averages using groupby, merges them, and then determines the comparison with a lambda function.

Go Solution

// This problem is SQL-specific; a Go implementation would typically use a SQL query through a database driver.

Go note: In practice, Go would execute the SQL query directly using database/sql or an ORM. No native Go slice manipulation is needed since SQL handles aggregation efficiently.

Worked Examples

Example 1

pay_date	employee_id	amount	department_id
2017-03-31	1	9000	1
2017-03-31	2	6000	2
2017-03-31	3	10000	2
2017-02-28	1	7000	1
2017-02-28	2	6000	2
2017-02-28	3	8000	2

Step 1: Compute department averages per month:

pay_month	department_id	dept_avg
2017-02	1	7000
2017-02	2	7000
2017-03	1	9000
2017-03	2	8000

Step 2: Compute company averages per month:

pay_month	company_avg
2017-02	7000
2017-03	8333.33

Step 3: Compare:

pay_month	department_id	comparison
2017-02	1	same
2017-02	2	same
2017-03	1	higher
2017-03	2	lower

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Single pass through Salary table for aggregation, plus merge
Space	O(n)	Storage for aggregated tables and intermediate joins

The complexity is dominated by the size of the Salary table, as each row is visited once for aggregation.

Test Cases

import pandas as pd

salary = pd.DataFrame({
    'id': [1,2,3,4,5,6],
    'employee_id': [1,2,3,1,2,3],
    'amount': [9000,6000,10000,7000,6000,8000],
    'pay_date': ['2017/03/31','2017/03/31','2017/03/31','2017/02/28','2017/02/28','2017/02/28']
})

employee = pd.DataFrame({
    'employee_id': [1,2,3],
    'department_id': [1,2,2]
})

result = department_vs_company_comparison(salary, employee)

# Provided example
assert result[(result.pay_month=='2017-02') & (result.department_id==1)]['comparison'].values[0] == 'same'
assert result[(result.pay_month=='2017-03') & (result.department_id==1)]['comparison'].values[0] == 'higher'
assert result[(result.pay_month=='2017-02') & (result.department_id==2)]['comparison'].values[0] == 'same'
assert result[(result.pay_month=='2017-03') & (result.department_id==2)]['comparison'].values[0] == 'lower'

# Edge: All salaries same
salary_edge = pd.DataFrame({'id':[1,2],'employee_id':[1,2],'amount':[5000,5000],'pay_date':['2022/01/31','2022/01/31']})
employee_edge = pd.DataFrame({'employee_id':[1,2],'department_id':[1,1]})
res_edge = department_vs_company_comparison(salary_edge, employee_edge)
assert res_edge['comparison'].values[0] == 'same'

Test	Why
Example 1	Validates main functionality and comparison logic
All same salary	Checks edge case where department and company averages are equal

Edge Cases

Single employee in a department: The department average equals the employee’s salary. The algorithm handles this by averaging over a single value, which works correctly.
Multiple salaries per month for same employee: The algorithm correctly averages all entries per department,