LeetCode 1907 - Count Salary Categories

This problem asks us to classify bank accounts into three salary categories based on their monthly income, then count how many accounts belong to each category. The input is a database table named Accounts.

LeetCode Problem 1907

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This problem asks us to classify bank accounts into three salary categories based on their monthly income, then count how many accounts belong to each category.

The input is a database table named Accounts. Each row contains two columns:

  • account_id, which uniquely identifies a bank account
  • income, which represents the monthly income associated with that account

The task is to produce a result table with exactly three rows, one for each salary category:

  • "Low Salary" for incomes strictly less than 20000
  • "Average Salary" for incomes between 20000 and 50000, inclusive
  • "High Salary" for incomes strictly greater than 50000

The output must contain all three categories, even if no accounts fall into one of them. In such cases, the count for that category must be 0.

A key detail is that the category boundaries are not all symmetric. The "Low Salary" category excludes 20000, while the "Average Salary" category includes both 20000 and 50000. Similarly, "High Salary" excludes 50000. Incorrect handling of these boundary conditions is one of the most common sources of bugs in this problem.

The problem is categorized as a database problem, so the expected solution is SQL rather than a traditional algorithmic implementation. The challenge is not computational complexity, but rather correctly grouping and counting records while ensuring all three categories appear in the result.

An important edge case occurs when the Accounts table is empty. Even then, the result must still include all three salary categories with counts of 0. Another important edge case is when all accounts fall into a single category. The implementation must still return rows for the other two categories.

Approaches

Brute Force Approach

A straightforward approach is to run three separate counting queries against the Accounts table.

The first query counts all rows where income < 20000. The second query counts rows where income BETWEEN 20000 AND 50000. The third query counts rows where income > 50000.

After computing the three counts independently, we combine them together using UNION statements to produce the final result table.

This approach is easy to understand and guarantees correctness because each category is explicitly computed according to its definition. However, it scans the Accounts table three separate times. While this is acceptable for small datasets, it becomes inefficient for larger tables because the same data is repeatedly processed.

Optimal Approach

A better approach is to scan the table only once.

The key observation is that every account belongs to exactly one salary category. Instead of running three independent queries, we can classify each row using a CASE expression, group by the generated category, and count the number of rows in each group.

However, a plain GROUP BY solution introduces another problem. If a category has no matching accounts, that category disappears entirely from the result set. Since the problem explicitly requires all three categories to appear, we need a mechanism to guarantee their presence.

The standard solution is to create a derived table containing all three categories, then perform a LEFT JOIN against the aggregated counts. This ensures every category appears exactly once, even if its count is zero.

Approach Time Complexity Space Complexity Notes
Brute Force O(3N) O(1) Scans the table three separate times
Optimal O(N) O(1) Single pass aggregation with category mapping

Algorithm Walkthrough

  1. Create a virtual table containing the three required salary categories. This guarantees that the final result always contains all categories, even if some have zero matching accounts.
  2. Traverse the Accounts table once and classify each account using a CASE expression:
  • If income < 20000, label it "Low Salary"
  • If income <= 50000, label it "Average Salary"
  • Otherwise, label it "High Salary"
  1. Group the classified rows by category and count how many accounts belong to each group.
  2. Perform a LEFT JOIN between the predefined category table and the aggregated counts. This ensures categories with no matches still appear in the result.
  3. Use IFNULL or COALESCE to replace missing counts with 0.

Why it works

The algorithm works because every account satisfies exactly one of the three mutually exclusive salary conditions. The CASE expression maps each account into its correct category, and the aggregation correctly counts the number of accounts per category. The predefined category table guarantees that all required categories exist in the final output, even when no rows match a category.

Python Solution

Since this is a database problem, the Python section demonstrates the SQL query as it would be submitted on LeetCode.

# Write your MySQL query statement below

WITH categories AS (
    SELECT 'Low Salary' AS category
    UNION ALL
    SELECT 'Average Salary'
    UNION ALL
    SELECT 'High Salary'
),
salary_counts AS (
    SELECT
        CASE
            WHEN income < 20000 THEN 'Low Salary'
            WHEN income <= 50000 THEN 'Average Salary'
            ELSE 'High Salary'
        END AS category,
        COUNT(*) AS accounts_count
    FROM Accounts
    GROUP BY category
)

SELECT
    c.category,
    COALESCE(s.accounts_count, 0) AS accounts_count
FROM categories c
LEFT JOIN salary_counts s
ON c.category = s.category;

The query begins by defining a common table expression named categories. This acts as a fixed lookup table containing the three required salary labels. Using this table ensures the final result always includes all categories.

The second common table expression, salary_counts, processes the Accounts table. The CASE expression classifies each income into exactly one category. After classification, the query groups rows by category and computes the count for each group.

The final query joins the predefined categories with the aggregated counts. A LEFT JOIN is essential because categories without matching accounts would otherwise disappear. COALESCE converts NULL values into 0, satisfying the problem requirement.

Go Solution

LeetCode database problems do not require Go code submissions because the expected solution language is SQL. However, the following Go implementation demonstrates the same logic programmatically.

package main

import "fmt"

type Account struct {
	AccountID int
	Income    int
}

type Result struct {
	Category     string
	AccountsCount int
}

func countSalaryCategories(accounts []Account) []Result {
	counts := map[string]int{
		"Low Salary":     0,
		"Average Salary": 0,
		"High Salary":    0,
	}

	for _, account := range accounts {
		if account.Income < 20000 {
			counts["Low Salary"]++
		} else if account.Income <= 50000 {
			counts["Average Salary"]++
		} else {
			counts["High Salary"]++
		}
	}

	return []Result{
		{"Low Salary", counts["Low Salary"]},
		{"Average Salary", counts["Average Salary"]},
		{"High Salary", counts["High Salary"]},
	}
}

func main() {
	accounts := []Account{
		{3, 108939},
		{2, 12747},
		{8, 87709},
		{6, 91796},
	}

	results := countSalaryCategories(accounts)

	for _, result := range results {
		fmt.Println(result)
	}
}

The Go implementation mirrors the SQL logic closely. A map stores the count for each salary category, initialized to zero so that all categories always exist. Each account is classified using conditional statements, then the appropriate counter is incremented.

One Go-specific detail is explicit initialization of all categories in the map. Unlike SQL joins with COALESCE, Go requires us to ensure keys exist before counting. Integer overflow is not a concern here because the constraints are small relative to Go's integer range.

Worked Examples

Example 1

Input table:

account_id income
3 108939
2 12747
8 87709
6 91796

Initial counts:

Category Count
Low Salary 0
Average Salary 0
High Salary 0

Process each account one by one.

Account Income Category Updated Counts
3 108939 High Salary Low: 0, Average: 0, High: 1
2 12747 Low Salary Low: 1, Average: 0, High: 1
8 87709 High Salary Low: 1, Average: 0, High: 2
6 91796 High Salary Low: 1, Average: 0, High: 3

Final result:

category accounts_count
Low Salary 1
Average Salary 0
High Salary 3

Complexity Analysis

Measure Complexity Explanation
Time O(N) Each account is processed exactly once
Space O(1) Only three category counters are stored

The algorithm performs a single linear scan through the Accounts table. Every row is classified once and contributes to exactly one counter. The amount of extra memory used remains constant because there are always exactly three categories regardless of input size.

Test Cases

def count_salary_categories(accounts):
    counts = {
        "Low Salary": 0,
        "Average Salary": 0,
        "High Salary": 0
    }

    for income in accounts:
        if income < 20000:
            counts["Low Salary"] += 1
        elif income <= 50000:
            counts["Average Salary"] += 1
        else:
            counts["High Salary"] += 1

    return counts

# Provided example
assert count_salary_categories([108939, 12747, 87709, 91796]) == {
    "Low Salary": 1,
    "Average Salary": 0,
    "High Salary": 3
}  # Example from problem statement

# Empty input
assert count_salary_categories([]) == {
    "Low Salary": 0,
    "Average Salary": 0,
    "High Salary": 0
}  # No accounts at all

# All low salaries
assert count_salary_categories([1000, 5000, 19999]) == {
    "Low Salary": 3,
    "Average Salary": 0,
    "High Salary": 0
}  # Every income below 20000

# All average salaries
assert count_salary_categories([20000, 30000, 50000]) == {
    "Low Salary": 0,
    "Average Salary": 3,
    "High Salary": 0
}  # Boundary-inclusive middle range

# All high salaries
assert count_salary_categories([50001, 100000]) == {
    "Low Salary": 0,
    "Average Salary": 0,
    "High Salary": 2
}  # All incomes above 50000

# Boundary values
assert count_salary_categories([19999, 20000, 50000, 50001]) == {
    "Low Salary": 1,
    "Average Salary": 2,
    "High Salary": 1
}  # Tests inclusive and exclusive boundaries

# Mixed categories
assert count_salary_categories([15000, 25000, 75000]) == {
    "Low Salary": 1,
    "Average Salary": 1,
    "High Salary": 1
}  # One account in each category
Test Why
Example input Verifies correctness against official example
Empty input Ensures zero counts still appear
All low salaries Tests exclusive upper boundary below 20000
All average salaries Tests inclusive boundaries at 20000 and 50000
All high salaries Tests values strictly above 50000
Boundary values Validates category edge handling
Mixed categories Confirms normal classification behavior

Edge Cases

One important edge case is an empty Accounts table. A naive GROUP BY query would return no rows at all because there is no data to aggregate. However, the problem explicitly requires all three categories to appear with counts of zero. The implementation handles this correctly by creating a predefined category table and using a LEFT JOIN.

Another critical edge case involves boundary values such as 20000 and 50000. These values belong to "Average Salary" because the range is inclusive. Incorrect use of comparison operators like < instead of <= can easily misclassify these records. The implementation carefully uses:

  • income < 20000
  • income <= 50000
  • ELSE

This guarantees correct boundary handling.

A third edge case occurs when all accounts belong to the same category. For example, if every income is greater than 50000, a plain grouped query would return only "High Salary". The implementation avoids this issue by joining against a fixed category list, ensuring the other categories still appear with count 0.

Another subtle edge case is duplicate income values. Multiple accounts may share the same income, but each account must still be counted independently. Since the query uses COUNT(*), every row contributes separately, which guarantees correct behavior regardless of repeated income values.