LeetCode 615 - Average Salary: Departments VS Company
This problem asks us to compare the average salary of each department against the company's overall average salary for each month. We are given two tables: Salary and Employee.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This problem asks us to compare the average salary of each department against the company's overall average salary for each month. We are given two tables: Salary and Employee. The Salary table records the monthly salary of each employee along with a date, while the Employee table maps employees to their department.
The goal is to produce a table where each row shows a month (pay_month), a department (department_id), and a comparison value indicating whether the department’s average salary in that month is "higher", "lower", or the "same" as the company's average salary in that same month.
Important points include:
pay_dateis a full date, but we only care about the month part for aggregation.- The average should be computed over all salaries in that month.
- Edge cases include months where a department has only one employee, months where all salaries are equal, or months where some employees may have multiple salary entries (e.g., bonuses or multiple pay entries per month).
- The output can be in any order.
The problem guarantees that every employee belongs to exactly one department and that the tables are well-formed.
Edge cases to consider upfront:
- A department has only one employee for a month.
- All employees have the same salary in a month.
- Salaries with multiple entries in the same month.
- A department with no employees in a given month should not appear in the result.
Approaches
Brute Force
The brute-force approach would be to iterate through each month, calculate the company’s average salary for that month, then for each department calculate its average salary for that month, and finally compare them. While correct, this requires multiple nested aggregations and can become expensive because it effectively scans the entire Salary table multiple times, which is inefficient for large datasets.
Optimal
The optimal approach leverages SQL’s ability to perform joins and aggregations. First, we join Salary with Employee to associate each salary with its department. Then, we extract the month from pay_date and compute the average salary grouped by month and department. Similarly, we compute the overall company average grouped by month. Finally, we join these two aggregates on the month and compare the averages using a CASE statement to produce "higher", "lower", or "same". This approach requires a single pass through the data for aggregation and a single join for comparison, making it much more efficient.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n * d) | O(n) | Nested iteration for each month and department; inefficient for large tables |
| Optimal | O(n) | O(n) | Single aggregation of salaries with join; much faster and leverages SQL optimizations |
Algorithm Walkthrough
- Join the
Salarytable withEmployeeonemployee_idto associate salaries with their department. - Extract
pay_monthfrompay_dateusing SQL date functions. - Compute the monthly department average salary by grouping by
pay_monthanddepartment_id. - Compute the monthly company average salary by grouping only by
pay_month. - Join the department average with the company average on
pay_month. - Use a
CASEexpression to comparedept_avgandcompany_avgand output "higher", "lower", or "same". - Return
pay_month,department_id, and the comparison.
Why it works: By aggregating first and then joining, we ensure that each department’s monthly average is accurately compared against the corresponding company monthly average. The CASE ensures the correct textual comparison.
Python Solution
# This problem is SQL-specific, but for reference, a Python pandas implementation could look like this
import pandas as pd
def department_vs_company_comparison(salary: pd.DataFrame, employee: pd.DataFrame) -> pd.DataFrame:
salary['pay_month'] = pd.to_datetime(salary['pay_date']).dt.to_period('M').astype(str)
merged = salary.merge(employee, on='employee_id')
dept_avg = merged.groupby(['pay_month', 'department_id'])['amount'].mean().reset_index(name='dept_avg')
company_avg = merged.groupby('pay_month')['amount'].mean().reset_index(name='company_avg')
result = dept_avg.merge(company_avg, on='pay_month')
def compare(row):
if row['dept_avg'] > row['company_avg']:
return 'higher'
elif row['dept_avg'] < row['company_avg']:
return 'lower'
else:
return 'same'
result['comparison'] = result.apply(compare, axis=1)
return result[['pay_month', 'department_id', 'comparison']]
Explanation: The code converts pay_date to a YYYY-MM string, joins Salary with Employee, calculates department and company averages using groupby, merges them, and then determines the comparison with a lambda function.
Go Solution
// This problem is SQL-specific; a Go implementation would typically use a SQL query through a database driver.
Go note: In practice, Go would execute the SQL query directly using database/sql or an ORM. No native Go slice manipulation is needed since SQL handles aggregation efficiently.
Worked Examples
Example 1
| pay_date | employee_id | amount | department_id |
|---|---|---|---|
| 2017-03-31 | 1 | 9000 | 1 |
| 2017-03-31 | 2 | 6000 | 2 |
| 2017-03-31 | 3 | 10000 | 2 |
| 2017-02-28 | 1 | 7000 | 1 |
| 2017-02-28 | 2 | 6000 | 2 |
| 2017-02-28 | 3 | 8000 | 2 |
Step 1: Compute department averages per month:
| pay_month | department_id | dept_avg |
|---|---|---|
| 2017-02 | 1 | 7000 |
| 2017-02 | 2 | 7000 |
| 2017-03 | 1 | 9000 |
| 2017-03 | 2 | 8000 |
Step 2: Compute company averages per month:
| pay_month | company_avg |
|---|---|
| 2017-02 | 7000 |
| 2017-03 | 8333.33 |
Step 3: Compare:
| pay_month | department_id | comparison |
|---|---|---|
| 2017-02 | 1 | same |
| 2017-02 | 2 | same |
| 2017-03 | 1 | higher |
| 2017-03 | 2 | lower |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Single pass through Salary table for aggregation, plus merge |
| Space | O(n) | Storage for aggregated tables and intermediate joins |
The complexity is dominated by the size of the Salary table, as each row is visited once for aggregation.
Test Cases
import pandas as pd
salary = pd.DataFrame({
'id': [1,2,3,4,5,6],
'employee_id': [1,2,3,1,2,3],
'amount': [9000,6000,10000,7000,6000,8000],
'pay_date': ['2017/03/31','2017/03/31','2017/03/31','2017/02/28','2017/02/28','2017/02/28']
})
employee = pd.DataFrame({
'employee_id': [1,2,3],
'department_id': [1,2,2]
})
result = department_vs_company_comparison(salary, employee)
# Provided example
assert result[(result.pay_month=='2017-02') & (result.department_id==1)]['comparison'].values[0] == 'same'
assert result[(result.pay_month=='2017-03') & (result.department_id==1)]['comparison'].values[0] == 'higher'
assert result[(result.pay_month=='2017-02') & (result.department_id==2)]['comparison'].values[0] == 'same'
assert result[(result.pay_month=='2017-03') & (result.department_id==2)]['comparison'].values[0] == 'lower'
# Edge: All salaries same
salary_edge = pd.DataFrame({'id':[1,2],'employee_id':[1,2],'amount':[5000,5000],'pay_date':['2022/01/31','2022/01/31']})
employee_edge = pd.DataFrame({'employee_id':[1,2],'department_id':[1,1]})
res_edge = department_vs_company_comparison(salary_edge, employee_edge)
assert res_edge['comparison'].values[0] == 'same'
| Test | Why |
|---|---|
| Example 1 | Validates main functionality and comparison logic |
| All same salary | Checks edge case where department and company averages are equal |
Edge Cases
- Single employee in a department: The department average equals the employee’s salary. The algorithm handles this by averaging over a single value, which works correctly.
- Multiple salaries per month for same employee: The algorithm correctly averages all entries per department,