LeetCode 569 - Median Employee Salary

The problem gives us an Employee table containing three columns: Column Meaning --- --- id Unique employee identifier company Company name salary Employee salary We need to return the row or rows that correspond to the median salary for each company.

LeetCode Problem 569

Difficulty: 🔴 Hard
Topics: Database

Solution

LeetCode 569, Median Employee Salary

Problem Understanding

The problem gives us an Employee table containing three columns:

Column Meaning
id Unique employee identifier
company Company name
salary Employee salary

We need to return the row or rows that correspond to the median salary for each company.

The important detail is that salaries must first be sorted in ascending order. If two employees have the same salary, the ordering is determined by id. This tie breaking rule matters because the median depends on the exact ordering.

The problem behaves slightly differently depending on whether a company has an odd or even number of employees.

If a company has an odd number of employees, there is exactly one median row.

If a company has an even number of employees, there are two middle rows, and both must be returned.

For example, if a company has six employees after sorting, positions 3 and 4 are the medians. If a company has five employees, position 3 is the median.

The output should contain the original rows from the Employee table corresponding to those median positions.

The constraints are not explicitly listed, but this is a database problem intended to test SQL ranking and grouping logic. The major challenge is correctly identifying median positions inside each company partition while preserving deterministic ordering using both salary and id.

Several edge cases are important:

A company may contain duplicate salaries. In this situation, sorting only by salary is insufficient because the order becomes ambiguous. The problem explicitly tells us to break ties using id.

A company may contain only one employee. In that case, that employee is trivially the median.

A company may contain an even number of employees, requiring two rows instead of one.

A naive implementation may incorrectly compute the median using only salary values instead of employee positions. The problem asks for median rows after sorting employees, not the mathematical median salary value.

Approaches

Brute Force Approach

The brute force solution processes each company independently.

For every company:

  1. Extract all employees belonging to that company.
  2. Sort them by (salary, id).
  3. Compute the total number of employees.
  4. Identify the median index or indices.
  5. Return the corresponding rows.

This approach is straightforward and guarantees correctness because it directly simulates the definition given in the problem statement.

However, it becomes inefficient if implemented manually without ranking support because we repeatedly sort partitions and scan lists. In SQL, this often leads to correlated subqueries or self joins with high computational cost.

Key Insight for the Optimal Approach

The important observation is that the median depends entirely on the rank of each employee within their company after sorting by (salary, id).

If we can assign each employee:

  • Their position in sorted order
  • The total number of employees in the company

then determining the median becomes easy.

For a company with n employees:

  • If n is odd, median position is (n + 1) / 2
  • If n is even, median positions are n / 2 and n / 2 + 1

Using SQL window functions, we can compute:

  • ROW_NUMBER() for the sorted position
  • COUNT() for company size

Then we simply filter rows whose rank matches the median positions.

This transforms the problem into a ranking problem instead of repeated sorting and scanning.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) O(n) Sort each company independently and manually locate medians
Optimal O(n log n) O(n) Use window functions to rank rows and filter median positions efficiently

Algorithm Walkthrough

  1. Partition employees by company.

We treat each company independently because median salaries are computed separately for every company. 2. Sort employees within each company.

Employees are ordered by:

  • salary ASC
  • id ASC

The secondary ordering by id ensures deterministic ranking when salaries are equal. 3. Assign row numbers.

We use ROW_NUMBER() to assign each employee a unique rank inside their company.

Example:

company salary id row_number
A 15 3 1
A 341 2 2
A 451 5 3
4. Compute company size.

Using COUNT(*) OVER (PARTITION BY company), we determine how many employees each company contains. 5. Identify median positions.

For company size n:

  • Left median position = (n + 1) / 2
  • Right median position = (n + 2) / 2

These formulas work for both odd and even counts.

Examples:

n Left Right
5 3 3
6 3 4
6. Filter rows whose rank matches median positions.

Employees whose row number lies between the two median boundaries are returned.

Why it works

The algorithm works because ROW_NUMBER() produces the exact ordering required by the problem statement. Once employees are ranked, the median becomes purely a positional property. The formulas for the left and right median positions correctly handle both odd and even company sizes, guaranteeing that all required median rows are returned.

Python Solution

# This problem is a SQL problem, so the "Python solution"
# represents the SQL query as a multiline string.

class Solution:
    def medianEmployeeSalary(self):
        return """
        WITH RankedEmployees AS (
            SELECT
                id,
                company,
                salary,
                ROW_NUMBER() OVER (
                    PARTITION BY company
                    ORDER BY salary, id
                ) AS rn,
                COUNT(*) OVER (
                    PARTITION BY company
                ) AS total_count
            FROM Employee
        )
        SELECT
            id,
            company,
            salary
        FROM RankedEmployees
        WHERE rn BETWEEN (total_count + 1) / 2
                     AND (total_count + 2) / 2;
        """

The solution first creates a common table expression called RankedEmployees.

Inside the CTE, every employee receives two computed values:

  • rn, the employee's rank within the company
  • total_count, the total number of employees in that company

The ranking uses:

ORDER BY salary, id

which exactly matches the problem requirement for tie breaking.

After ranking, the outer query filters rows whose rank lies between the two median positions.

For odd counts, both formulas evaluate to the same position, producing one row.

For even counts, the formulas produce two consecutive positions, producing two rows.

This elegantly handles both cases with a single condition.

Go Solution

package main

func medianEmployeeSalaryQuery() string {
	return `
WITH RankedEmployees AS (
    SELECT
        id,
        company,
        salary,
        ROW_NUMBER() OVER (
            PARTITION BY company
            ORDER BY salary, id
        ) AS rn,
        COUNT(*) OVER (
            PARTITION BY company
        ) AS total_count
    FROM Employee
)
SELECT
    id,
    company,
    salary
FROM RankedEmployees
WHERE rn BETWEEN (total_count + 1) / 2
             AND (total_count + 2) / 2;
`
}

Because this is fundamentally a SQL database problem, the Go version simply returns the SQL query string.

One implementation detail worth noting is integer division behavior in SQL. The formulas:

(total_count + 1) / 2
(total_count + 2) / 2

rely on integer division semantics to correctly compute median indices.

The logic remains identical to the Python version.

Worked Examples

Example 1, Company A

Input rows:

id salary
1 2341
2 341
3 15
4 15314
5 451
6 513

After sorting by (salary, id):

Row Number id salary
1 3 15
2 2 341
3 5 451
4 6 513
5 1 2341
6 4 15314

Total employees:

n = 6

Median boundaries:

left = (6 + 1) / 2 = 3
right = (6 + 2) / 2 = 4

Rows with ranks 3 and 4 are returned:

id company salary
5 A 451
6 A 513

Example 1, Company B

Sorted employees:

Row Number id salary
1 8 13
2 7 15
3 12 234
4 9 1154
5 11 1221
6 10 1345

Total employees:

n = 6

Median boundaries:

left = 3
right = 4

Returned rows:

id company salary
12 B 234
9 B 1154

Example 1, Company C

Sorted employees:

Row Number id salary
1 17 65
2 13 2345
3 14 2645
4 15 2645
5 16 2652

Total employees:

n = 5

Median boundaries:

left = 3
right = 3

Only row 3 is returned:

id company salary
14 C 2645

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Window ranking requires sorting employees within each company
Space O(n) Intermediate ranking structures are maintained by the database engine

The dominant cost comes from sorting rows inside each company partition. Window functions internally require ordered processing, which typically results in sorting complexity proportional to O(n log n).

The extra space complexity depends on the database execution engine, but ranking operations generally require temporary storage proportional to the number of rows.

Test Cases

# Single employee company
assert get_medians([
    (1, "A", 100)
]) == [
    (1, "A", 100)
]  # Single employee is median

# Odd number of employees
assert get_medians([
    (1, "A", 100),
    (2, "A", 200),
    (3, "A", 300)
]) == [
    (2, "A", 200)
]  # Middle element

# Even number of employees
assert get_medians([
    (1, "A", 100),
    (2, "A", 200),
    (3, "A", 300),
    (4, "A", 400)
]) == [
    (2, "A", 200),
    (3, "A", 300)
]  # Two middle elements

# Duplicate salaries with id tie breaking
assert get_medians([
    (1, "A", 100),
    (2, "A", 100),
    (3, "A", 100),
    (4, "A", 100)
]) == [
    (2, "A", 100),
    (3, "A", 100)
]  # Ordered by id

# Multiple companies
assert get_medians([
    (1, "A", 100),
    (2, "A", 200),
    (3, "B", 300),
    (4, "B", 400),
    (5, "B", 500)
]) == [
    (1, "A", 100),
    (2, "A", 200),
    (4, "B", 400)
]  # Independent company processing

# Non sorted input
assert get_medians([
    (3, "A", 300),
    (1, "A", 100),
    (2, "A", 200)
]) == [
    (2, "A", 200)
]  # Sorting required

# Duplicate salaries with mixed ids
assert get_medians([
    (5, "A", 500),
    (1, "A", 100),
    (2, "A", 500),
    (3, "A", 500),
    (4, "A", 600)
]) == [
    (2, "A", 500)
]  # Tie broken by id
Test Why
Single employee Validates smallest possible partition
Odd employee count Ensures single median selection
Even employee count Ensures two medians are returned
Duplicate salaries Verifies tie breaking by id
Multiple companies Ensures partition isolation
Non sorted input Confirms algorithm performs sorting
Mixed duplicate salaries Tests deterministic ordering

Edge Cases

A very important edge case occurs when multiple employees share the same salary. If we sort only by salary, the ordering becomes nondeterministic, which can produce inconsistent medians across executions. The implementation avoids this by explicitly sorting with (salary, id) so every employee receives a unique and stable rank.

Another important edge case is when a company has an even number of employees. Many incorrect solutions return only one middle row by computing a single median value. This problem requires both middle rows. The implementation correctly computes two median boundaries and returns every row whose rank falls between them.

A company with only one employee is another subtle case. Some implementations may incorrectly compute invalid boundaries or assume at least two employees exist. Here, both boundary formulas evaluate to 1, so the lone employee is correctly returned as the median.

A final edge case involves unsorted input data. Database rows are not guaranteed to appear in salary order. The algorithm explicitly sorts employees inside each company partition before assigning row numbers, ensuring correctness regardless of input order.