LeetCode 569 - Median Employee Salary
The problem gives us an Employee table containing three columns: Column Meaning --- --- id Unique employee identifier company Company name salary Employee salary We need to return the row or rows that correspond to the median salary for each company.
Difficulty: 🔴 Hard
Topics: Database
Solution
LeetCode 569, Median Employee Salary
Problem Understanding
The problem gives us an Employee table containing three columns:
| Column | Meaning |
|---|---|
id |
Unique employee identifier |
company |
Company name |
salary |
Employee salary |
We need to return the row or rows that correspond to the median salary for each company.
The important detail is that salaries must first be sorted in ascending order. If two employees have the same salary, the ordering is determined by id. This tie breaking rule matters because the median depends on the exact ordering.
The problem behaves slightly differently depending on whether a company has an odd or even number of employees.
If a company has an odd number of employees, there is exactly one median row.
If a company has an even number of employees, there are two middle rows, and both must be returned.
For example, if a company has six employees after sorting, positions 3 and 4 are the medians. If a company has five employees, position 3 is the median.
The output should contain the original rows from the Employee table corresponding to those median positions.
The constraints are not explicitly listed, but this is a database problem intended to test SQL ranking and grouping logic. The major challenge is correctly identifying median positions inside each company partition while preserving deterministic ordering using both salary and id.
Several edge cases are important:
A company may contain duplicate salaries. In this situation, sorting only by salary is insufficient because the order becomes ambiguous. The problem explicitly tells us to break ties using id.
A company may contain only one employee. In that case, that employee is trivially the median.
A company may contain an even number of employees, requiring two rows instead of one.
A naive implementation may incorrectly compute the median using only salary values instead of employee positions. The problem asks for median rows after sorting employees, not the mathematical median salary value.
Approaches
Brute Force Approach
The brute force solution processes each company independently.
For every company:
- Extract all employees belonging to that company.
- Sort them by
(salary, id). - Compute the total number of employees.
- Identify the median index or indices.
- Return the corresponding rows.
This approach is straightforward and guarantees correctness because it directly simulates the definition given in the problem statement.
However, it becomes inefficient if implemented manually without ranking support because we repeatedly sort partitions and scan lists. In SQL, this often leads to correlated subqueries or self joins with high computational cost.
Key Insight for the Optimal Approach
The important observation is that the median depends entirely on the rank of each employee within their company after sorting by (salary, id).
If we can assign each employee:
- Their position in sorted order
- The total number of employees in the company
then determining the median becomes easy.
For a company with n employees:
- If
nis odd, median position is(n + 1) / 2 - If
nis even, median positions aren / 2andn / 2 + 1
Using SQL window functions, we can compute:
ROW_NUMBER()for the sorted positionCOUNT()for company size
Then we simply filter rows whose rank matches the median positions.
This transforms the problem into a ranking problem instead of repeated sorting and scanning.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) | O(n) | Sort each company independently and manually locate medians |
| Optimal | O(n log n) | O(n) | Use window functions to rank rows and filter median positions efficiently |
Algorithm Walkthrough
- Partition employees by company.
We treat each company independently because median salaries are computed separately for every company. 2. Sort employees within each company.
Employees are ordered by:
salary ASCid ASC
The secondary ordering by id ensures deterministic ranking when salaries are equal.
3. Assign row numbers.
We use ROW_NUMBER() to assign each employee a unique rank inside their company.
Example:
| company | salary | id | row_number |
|---|---|---|---|
| A | 15 | 3 | 1 |
| A | 341 | 2 | 2 |
| A | 451 | 5 | 3 |
| 4. Compute company size. |
Using COUNT(*) OVER (PARTITION BY company), we determine how many employees each company contains.
5. Identify median positions.
For company size n:
- Left median position =
(n + 1) / 2 - Right median position =
(n + 2) / 2
These formulas work for both odd and even counts.
Examples:
| n | Left | Right |
|---|---|---|
| 5 | 3 | 3 |
| 6 | 3 | 4 |
| 6. Filter rows whose rank matches median positions. |
Employees whose row number lies between the two median boundaries are returned.
Why it works
The algorithm works because ROW_NUMBER() produces the exact ordering required by the problem statement. Once employees are ranked, the median becomes purely a positional property. The formulas for the left and right median positions correctly handle both odd and even company sizes, guaranteeing that all required median rows are returned.
Python Solution
# This problem is a SQL problem, so the "Python solution"
# represents the SQL query as a multiline string.
class Solution:
def medianEmployeeSalary(self):
return """
WITH RankedEmployees AS (
SELECT
id,
company,
salary,
ROW_NUMBER() OVER (
PARTITION BY company
ORDER BY salary, id
) AS rn,
COUNT(*) OVER (
PARTITION BY company
) AS total_count
FROM Employee
)
SELECT
id,
company,
salary
FROM RankedEmployees
WHERE rn BETWEEN (total_count + 1) / 2
AND (total_count + 2) / 2;
"""
The solution first creates a common table expression called RankedEmployees.
Inside the CTE, every employee receives two computed values:
rn, the employee's rank within the companytotal_count, the total number of employees in that company
The ranking uses:
ORDER BY salary, id
which exactly matches the problem requirement for tie breaking.
After ranking, the outer query filters rows whose rank lies between the two median positions.
For odd counts, both formulas evaluate to the same position, producing one row.
For even counts, the formulas produce two consecutive positions, producing two rows.
This elegantly handles both cases with a single condition.
Go Solution
package main
func medianEmployeeSalaryQuery() string {
return `
WITH RankedEmployees AS (
SELECT
id,
company,
salary,
ROW_NUMBER() OVER (
PARTITION BY company
ORDER BY salary, id
) AS rn,
COUNT(*) OVER (
PARTITION BY company
) AS total_count
FROM Employee
)
SELECT
id,
company,
salary
FROM RankedEmployees
WHERE rn BETWEEN (total_count + 1) / 2
AND (total_count + 2) / 2;
`
}
Because this is fundamentally a SQL database problem, the Go version simply returns the SQL query string.
One implementation detail worth noting is integer division behavior in SQL. The formulas:
(total_count + 1) / 2
(total_count + 2) / 2
rely on integer division semantics to correctly compute median indices.
The logic remains identical to the Python version.
Worked Examples
Example 1, Company A
Input rows:
| id | salary |
|---|---|
| 1 | 2341 |
| 2 | 341 |
| 3 | 15 |
| 4 | 15314 |
| 5 | 451 |
| 6 | 513 |
After sorting by (salary, id):
| Row Number | id | salary |
|---|---|---|
| 1 | 3 | 15 |
| 2 | 2 | 341 |
| 3 | 5 | 451 |
| 4 | 6 | 513 |
| 5 | 1 | 2341 |
| 6 | 4 | 15314 |
Total employees:
n = 6
Median boundaries:
left = (6 + 1) / 2 = 3
right = (6 + 2) / 2 = 4
Rows with ranks 3 and 4 are returned:
| id | company | salary |
|---|---|---|
| 5 | A | 451 |
| 6 | A | 513 |
Example 1, Company B
Sorted employees:
| Row Number | id | salary |
|---|---|---|
| 1 | 8 | 13 |
| 2 | 7 | 15 |
| 3 | 12 | 234 |
| 4 | 9 | 1154 |
| 5 | 11 | 1221 |
| 6 | 10 | 1345 |
Total employees:
n = 6
Median boundaries:
left = 3
right = 4
Returned rows:
| id | company | salary |
|---|---|---|
| 12 | B | 234 |
| 9 | B | 1154 |
Example 1, Company C
Sorted employees:
| Row Number | id | salary |
|---|---|---|
| 1 | 17 | 65 |
| 2 | 13 | 2345 |
| 3 | 14 | 2645 |
| 4 | 15 | 2645 |
| 5 | 16 | 2652 |
Total employees:
n = 5
Median boundaries:
left = 3
right = 3
Only row 3 is returned:
| id | company | salary |
|---|---|---|
| 14 | C | 2645 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Window ranking requires sorting employees within each company |
| Space | O(n) | Intermediate ranking structures are maintained by the database engine |
The dominant cost comes from sorting rows inside each company partition. Window functions internally require ordered processing, which typically results in sorting complexity proportional to O(n log n).
The extra space complexity depends on the database execution engine, but ranking operations generally require temporary storage proportional to the number of rows.
Test Cases
# Single employee company
assert get_medians([
(1, "A", 100)
]) == [
(1, "A", 100)
] # Single employee is median
# Odd number of employees
assert get_medians([
(1, "A", 100),
(2, "A", 200),
(3, "A", 300)
]) == [
(2, "A", 200)
] # Middle element
# Even number of employees
assert get_medians([
(1, "A", 100),
(2, "A", 200),
(3, "A", 300),
(4, "A", 400)
]) == [
(2, "A", 200),
(3, "A", 300)
] # Two middle elements
# Duplicate salaries with id tie breaking
assert get_medians([
(1, "A", 100),
(2, "A", 100),
(3, "A", 100),
(4, "A", 100)
]) == [
(2, "A", 100),
(3, "A", 100)
] # Ordered by id
# Multiple companies
assert get_medians([
(1, "A", 100),
(2, "A", 200),
(3, "B", 300),
(4, "B", 400),
(5, "B", 500)
]) == [
(1, "A", 100),
(2, "A", 200),
(4, "B", 400)
] # Independent company processing
# Non sorted input
assert get_medians([
(3, "A", 300),
(1, "A", 100),
(2, "A", 200)
]) == [
(2, "A", 200)
] # Sorting required
# Duplicate salaries with mixed ids
assert get_medians([
(5, "A", 500),
(1, "A", 100),
(2, "A", 500),
(3, "A", 500),
(4, "A", 600)
]) == [
(2, "A", 500)
] # Tie broken by id
| Test | Why |
|---|---|
| Single employee | Validates smallest possible partition |
| Odd employee count | Ensures single median selection |
| Even employee count | Ensures two medians are returned |
| Duplicate salaries | Verifies tie breaking by id |
| Multiple companies | Ensures partition isolation |
| Non sorted input | Confirms algorithm performs sorting |
| Mixed duplicate salaries | Tests deterministic ordering |
Edge Cases
A very important edge case occurs when multiple employees share the same salary. If we sort only by salary, the ordering becomes nondeterministic, which can produce inconsistent medians across executions. The implementation avoids this by explicitly sorting with (salary, id) so every employee receives a unique and stable rank.
Another important edge case is when a company has an even number of employees. Many incorrect solutions return only one middle row by computing a single median value. This problem requires both middle rows. The implementation correctly computes two median boundaries and returns every row whose rank falls between them.
A company with only one employee is another subtle case. Some implementations may incorrectly compute invalid boundaries or assume at least two employees exist. Here, both boundary formulas evaluate to 1, so the lone employee is correctly returned as the median.
A final edge case involves unsorted input data. Database rows are not guaranteed to appear in salary order. The algorithm explicitly sorts employees inside each company partition before assigning row numbers, ensuring correctness regardless of input order.