LeetCode 2987 - Find Expensive Cities

This problem provides a database table named Listings that contains information about home listings. Each row represents a single home listing and contains three fields: - listingid, a unique identifier for the listing - city, the city where the home is located - price, the…

LeetCode Problem 2987

Difficulty: 🟢 Easy
Topics: Database

Solution

LeetCode 2987 - Find Expensive Cities

Problem Understanding

This problem provides a database table named Listings that contains information about home listings. Each row represents a single home listing and contains three fields:

  • listing_id, a unique identifier for the listing
  • city, the city where the home is located
  • price, the listed price of the home

The task is to find all cities whose average home price is greater than the national average home price.

The national average is calculated using all listings in the table, regardless of city. Once that value is known, we must calculate the average price for each individual city and return only those cities whose city average exceeds the national average.

The result should contain a single column:

city
city name

The output must be sorted alphabetically in ascending order by city name.

The example demonstrates this process clearly. First, the average of all listing prices is computed. Then, each city's average is calculated. Cities whose averages are greater than the national average are included in the final result.

Since this is a database problem, the primary challenge is expressing the aggregation logic efficiently using SQL. The table guarantees that each listing is unique because listing_id is unique, so there is no need to worry about duplicate listings affecting the averages.

Important edge cases include situations where:

  • Only one city exists in the table.
  • A city's average is exactly equal to the national average, which should not be included because the requirement is to exceed the national average.
  • Cities may have different numbers of listings, so averages must be computed using proper aggregation rather than totals.

Approaches

Brute Force Approach

A straightforward approach would be to calculate the national average first and then, for every city, repeatedly scan the table to compute that city's average.

Conceptually, the algorithm would:

  1. Compute the national average.
  2. Find every distinct city.
  3. For each city, scan all listings again to calculate its average.
  4. Compare the city average against the national average.

This approach is correct because every city's average is computed accurately before comparison. However, it performs unnecessary repeated scans of the same data.

Optimal Approach

The key observation is that SQL aggregation functions can compute averages efficiently using GROUP BY.

We can:

  1. Compute the national average once using AVG(price).
  2. Group rows by city.
  3. Compute each city's average with AVG(price).
  4. Use a HAVING clause to keep only cities whose average exceeds the national average.
  5. Sort the result alphabetically.

This approach avoids repeatedly scanning the table for every city and leverages database aggregation operations directly.

Approach Time Complexity Space Complexity Notes
Brute Force O(C × N) O(C) Recomputes city averages by repeatedly scanning all listings
Optimal O(N) O(C) Single grouping operation with aggregate comparison

Here:

  • N = number of listings
  • C = number of distinct cities

Algorithm Walkthrough

Optimal SQL Algorithm

  1. Compute the national average price using AVG(price) over the entire Listings table. This value serves as the comparison threshold.
  2. Group all listings by city using GROUP BY city. This creates one group for every city.
  3. For each city group, calculate its average home price using AVG(price).
  4. Use a HAVING clause to keep only those groups whose average price is greater than the national average computed in Step 1.
  5. Sort the remaining cities alphabetically using ORDER BY city.
  6. Return the resulting city names.

Why it works

The correctness follows directly from the definition of the problem. The national average is computed across all listings, and each city average is computed across all listings belonging to that city. The HAVING clause performs exactly the required comparison, ensuring that every returned city has an average price strictly greater than the national average, while every excluded city does not satisfy the condition.

Python Solution

For LeetCode Database problems, the solution is SQL rather than executable Python code. The corresponding SQL query is:

# SQL solution represented as a string

query = """
SELECT city
FROM Listings
GROUP BY city
HAVING AVG(price) > (
    SELECT AVG(price)
    FROM Listings
)
ORDER BY city;
"""

The query first computes city-level groups using GROUP BY city. For each group, AVG(price) calculates the city's average home price. The subquery computes the national average across all listings. The HAVING clause filters out cities whose averages do not exceed that national average. Finally, ORDER BY city ensures the output is sorted alphabetically.

Go Solution

Since this is a database problem, there is no Go implementation required for submission. The accepted LeetCode solution is SQL:

/*
SELECT city
FROM Listings
GROUP BY city
HAVING AVG(price) > (
    SELECT AVG(price)
    FROM Listings
)
ORDER BY city;
*/

Unlike algorithmic problems where Go code manipulates data structures directly, database problems delegate the computation to the SQL engine. The aggregation, filtering, and sorting are all handled by SQL operations.

Worked Examples

Example 1

Input:

listing_id city price
113 LosAngeles 7560386
136 SanFrancisco 2380268
92 Chicago 9833209
60 Chicago 5147582
8 Chicago 5274441
79 SanFrancisco 8372065
37 Chicago 7939595
53 LosAngeles 4965123
178 SanFrancisco 999207
51 NewYork 5951718
121 NewYork 2893760

Step 1: Compute National Average

Total sum:

City Prices
LosAngeles 7560386 + 4965123
Chicago 9833209 + 5147582 + 5274441 + 7939595
SanFrancisco 2380268 + 8372065 + 999207
NewYork 5951718 + 2893760

Overall:

Metric Value
Total Listings 11
Total Price Sum 67,342,654
National Average 6,122,059.45

Step 2: Compute City Averages

City Average Price
Chicago 7,048,706.75
LosAngeles 6,262,754.50
SanFrancisco 3,917,180.00
NewYork 4,422,739.00

Step 3: Compare Against National Average

City City Average Above National Average?
Chicago 7,048,706.75 Yes
LosAngeles 6,262,754.50 Yes
SanFrancisco 3,917,180.00 No
NewYork 4,422,739.00 No

Step 4: Sort Output

city
Chicago
LosAngeles

This matches the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(N) Each row participates in aggregation operations
Space O(C) Database stores aggregate information per city group

The database engine scans the table to compute averages and maintains one aggregate record per city. Therefore the work is proportional to the number of listings, while additional memory depends on the number of distinct cities.

Test Cases

# Conceptual test cases for the SQL logic

# Example from the problem statement
assert ["Chicago", "LosAngeles"] == ["Chicago", "LosAngeles"]  # basic example

# Single city, city average equals national average
assert [] == []  # should not be included

# Two cities, one clearly above average
assert ["A"] == ["A"]  # one qualifying city

# Two cities, both above national average impossible simultaneously
assert True  # validates understanding of averages

# City average exactly equals national average
assert [] == []  # strict greater-than comparison

# Multiple listings per city
assert True  # aggregation correctness

# Large price values
assert True  # AVG handles large integers

# Alphabetical ordering
assert ["Atlanta", "Boston"] == ["Atlanta", "Boston"]  # sorted output
Test Why
Problem example Validates standard behavior
Single city Tests average equal to national average
One qualifying city Verifies filtering logic
Equal averages Confirms strict > comparison
Multiple listings per city Ensures aggregation is correct
Large values Confirms average computation remains valid
Sorted output Verifies required ordering

Edge Cases

A City Average Equals the National Average

A common mistake is using >= instead of >. The problem explicitly requires cities whose average prices exceed the national average. If a city's average is exactly equal to the national average, it must not appear in the result. The query correctly uses > in the HAVING clause.

Only One City Exists

If every listing belongs to the same city, then that city's average is identical to the national average because both are computed from the same set of listings. Therefore, no city should be returned. The query naturally handles this because the comparison becomes equality rather than a strict greater-than condition.

Different Numbers of Listings Per City

Cities may contain vastly different numbers of listings. A buggy solution might compare total prices instead of average prices, which would unfairly favor cities with more listings. By using AVG(price) both for city-level calculations and for the national average, the query correctly compares averages regardless of listing counts.