LeetCode 1907 - Count Salary Categories
This problem asks us to classify bank accounts into three salary categories based on their monthly income, then count how many accounts belong to each category. The input is a database table named Accounts.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This problem asks us to classify bank accounts into three salary categories based on their monthly income, then count how many accounts belong to each category.
The input is a database table named Accounts. Each row contains two columns:
account_id, which uniquely identifies a bank accountincome, which represents the monthly income associated with that account
The task is to produce a result table with exactly three rows, one for each salary category:
"Low Salary"for incomes strictly less than20000"Average Salary"for incomes between20000and50000, inclusive"High Salary"for incomes strictly greater than50000
The output must contain all three categories, even if no accounts fall into one of them. In such cases, the count for that category must be 0.
A key detail is that the category boundaries are not all symmetric. The "Low Salary" category excludes 20000, while the "Average Salary" category includes both 20000 and 50000. Similarly, "High Salary" excludes 50000. Incorrect handling of these boundary conditions is one of the most common sources of bugs in this problem.
The problem is categorized as a database problem, so the expected solution is SQL rather than a traditional algorithmic implementation. The challenge is not computational complexity, but rather correctly grouping and counting records while ensuring all three categories appear in the result.
An important edge case occurs when the Accounts table is empty. Even then, the result must still include all three salary categories with counts of 0. Another important edge case is when all accounts fall into a single category. The implementation must still return rows for the other two categories.
Approaches
Brute Force Approach
A straightforward approach is to run three separate counting queries against the Accounts table.
The first query counts all rows where income < 20000. The second query counts rows where income BETWEEN 20000 AND 50000. The third query counts rows where income > 50000.
After computing the three counts independently, we combine them together using UNION statements to produce the final result table.
This approach is easy to understand and guarantees correctness because each category is explicitly computed according to its definition. However, it scans the Accounts table three separate times. While this is acceptable for small datasets, it becomes inefficient for larger tables because the same data is repeatedly processed.
Optimal Approach
A better approach is to scan the table only once.
The key observation is that every account belongs to exactly one salary category. Instead of running three independent queries, we can classify each row using a CASE expression, group by the generated category, and count the number of rows in each group.
However, a plain GROUP BY solution introduces another problem. If a category has no matching accounts, that category disappears entirely from the result set. Since the problem explicitly requires all three categories to appear, we need a mechanism to guarantee their presence.
The standard solution is to create a derived table containing all three categories, then perform a LEFT JOIN against the aggregated counts. This ensures every category appears exactly once, even if its count is zero.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(3N) | O(1) | Scans the table three separate times |
| Optimal | O(N) | O(1) | Single pass aggregation with category mapping |
Algorithm Walkthrough
- Create a virtual table containing the three required salary categories. This guarantees that the final result always contains all categories, even if some have zero matching accounts.
- Traverse the
Accountstable once and classify each account using aCASEexpression:
- If
income < 20000, label it"Low Salary" - If
income <= 50000, label it"Average Salary" - Otherwise, label it
"High Salary"
- Group the classified rows by category and count how many accounts belong to each group.
- Perform a
LEFT JOINbetween the predefined category table and the aggregated counts. This ensures categories with no matches still appear in the result. - Use
IFNULLorCOALESCEto replace missing counts with0.
Why it works
The algorithm works because every account satisfies exactly one of the three mutually exclusive salary conditions. The CASE expression maps each account into its correct category, and the aggregation correctly counts the number of accounts per category. The predefined category table guarantees that all required categories exist in the final output, even when no rows match a category.
Python Solution
Since this is a database problem, the Python section demonstrates the SQL query as it would be submitted on LeetCode.
# Write your MySQL query statement below
WITH categories AS (
SELECT 'Low Salary' AS category
UNION ALL
SELECT 'Average Salary'
UNION ALL
SELECT 'High Salary'
),
salary_counts AS (
SELECT
CASE
WHEN income < 20000 THEN 'Low Salary'
WHEN income <= 50000 THEN 'Average Salary'
ELSE 'High Salary'
END AS category,
COUNT(*) AS accounts_count
FROM Accounts
GROUP BY category
)
SELECT
c.category,
COALESCE(s.accounts_count, 0) AS accounts_count
FROM categories c
LEFT JOIN salary_counts s
ON c.category = s.category;
The query begins by defining a common table expression named categories. This acts as a fixed lookup table containing the three required salary labels. Using this table ensures the final result always includes all categories.
The second common table expression, salary_counts, processes the Accounts table. The CASE expression classifies each income into exactly one category. After classification, the query groups rows by category and computes the count for each group.
The final query joins the predefined categories with the aggregated counts. A LEFT JOIN is essential because categories without matching accounts would otherwise disappear. COALESCE converts NULL values into 0, satisfying the problem requirement.
Go Solution
LeetCode database problems do not require Go code submissions because the expected solution language is SQL. However, the following Go implementation demonstrates the same logic programmatically.
package main
import "fmt"
type Account struct {
AccountID int
Income int
}
type Result struct {
Category string
AccountsCount int
}
func countSalaryCategories(accounts []Account) []Result {
counts := map[string]int{
"Low Salary": 0,
"Average Salary": 0,
"High Salary": 0,
}
for _, account := range accounts {
if account.Income < 20000 {
counts["Low Salary"]++
} else if account.Income <= 50000 {
counts["Average Salary"]++
} else {
counts["High Salary"]++
}
}
return []Result{
{"Low Salary", counts["Low Salary"]},
{"Average Salary", counts["Average Salary"]},
{"High Salary", counts["High Salary"]},
}
}
func main() {
accounts := []Account{
{3, 108939},
{2, 12747},
{8, 87709},
{6, 91796},
}
results := countSalaryCategories(accounts)
for _, result := range results {
fmt.Println(result)
}
}
The Go implementation mirrors the SQL logic closely. A map stores the count for each salary category, initialized to zero so that all categories always exist. Each account is classified using conditional statements, then the appropriate counter is incremented.
One Go-specific detail is explicit initialization of all categories in the map. Unlike SQL joins with COALESCE, Go requires us to ensure keys exist before counting. Integer overflow is not a concern here because the constraints are small relative to Go's integer range.
Worked Examples
Example 1
Input table:
| account_id | income |
|---|---|
| 3 | 108939 |
| 2 | 12747 |
| 8 | 87709 |
| 6 | 91796 |
Initial counts:
| Category | Count |
|---|---|
| Low Salary | 0 |
| Average Salary | 0 |
| High Salary | 0 |
Process each account one by one.
| Account | Income | Category | Updated Counts |
|---|---|---|---|
| 3 | 108939 | High Salary | Low: 0, Average: 0, High: 1 |
| 2 | 12747 | Low Salary | Low: 1, Average: 0, High: 1 |
| 8 | 87709 | High Salary | Low: 1, Average: 0, High: 2 |
| 6 | 91796 | High Salary | Low: 1, Average: 0, High: 3 |
Final result:
| category | accounts_count |
|---|---|
| Low Salary | 1 |
| Average Salary | 0 |
| High Salary | 3 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N) | Each account is processed exactly once |
| Space | O(1) | Only three category counters are stored |
The algorithm performs a single linear scan through the Accounts table. Every row is classified once and contributes to exactly one counter. The amount of extra memory used remains constant because there are always exactly three categories regardless of input size.
Test Cases
def count_salary_categories(accounts):
counts = {
"Low Salary": 0,
"Average Salary": 0,
"High Salary": 0
}
for income in accounts:
if income < 20000:
counts["Low Salary"] += 1
elif income <= 50000:
counts["Average Salary"] += 1
else:
counts["High Salary"] += 1
return counts
# Provided example
assert count_salary_categories([108939, 12747, 87709, 91796]) == {
"Low Salary": 1,
"Average Salary": 0,
"High Salary": 3
} # Example from problem statement
# Empty input
assert count_salary_categories([]) == {
"Low Salary": 0,
"Average Salary": 0,
"High Salary": 0
} # No accounts at all
# All low salaries
assert count_salary_categories([1000, 5000, 19999]) == {
"Low Salary": 3,
"Average Salary": 0,
"High Salary": 0
} # Every income below 20000
# All average salaries
assert count_salary_categories([20000, 30000, 50000]) == {
"Low Salary": 0,
"Average Salary": 3,
"High Salary": 0
} # Boundary-inclusive middle range
# All high salaries
assert count_salary_categories([50001, 100000]) == {
"Low Salary": 0,
"Average Salary": 0,
"High Salary": 2
} # All incomes above 50000
# Boundary values
assert count_salary_categories([19999, 20000, 50000, 50001]) == {
"Low Salary": 1,
"Average Salary": 2,
"High Salary": 1
} # Tests inclusive and exclusive boundaries
# Mixed categories
assert count_salary_categories([15000, 25000, 75000]) == {
"Low Salary": 1,
"Average Salary": 1,
"High Salary": 1
} # One account in each category
| Test | Why |
|---|---|
| Example input | Verifies correctness against official example |
| Empty input | Ensures zero counts still appear |
| All low salaries | Tests exclusive upper boundary below 20000 |
| All average salaries | Tests inclusive boundaries at 20000 and 50000 |
| All high salaries | Tests values strictly above 50000 |
| Boundary values | Validates category edge handling |
| Mixed categories | Confirms normal classification behavior |
Edge Cases
One important edge case is an empty Accounts table. A naive GROUP BY query would return no rows at all because there is no data to aggregate. However, the problem explicitly requires all three categories to appear with counts of zero. The implementation handles this correctly by creating a predefined category table and using a LEFT JOIN.
Another critical edge case involves boundary values such as 20000 and 50000. These values belong to "Average Salary" because the range is inclusive. Incorrect use of comparison operators like < instead of <= can easily misclassify these records. The implementation carefully uses:
income < 20000income <= 50000ELSE
This guarantees correct boundary handling.
A third edge case occurs when all accounts belong to the same category. For example, if every income is greater than 50000, a plain grouped query would return only "High Salary". The implementation avoids this issue by joining against a fixed category list, ensuring the other categories still appear with count 0.
Another subtle edge case is duplicate income values. Multiple accounts may share the same income, but each account must still be counted independently. Since the query uses COUNT(*), every row contributes separately, which guarantees correct behavior regardless of repeated income values.