LeetCode 1501 - Countries You Can Safely Invest In

This is a SQL database problem where we need to identify countries whose average call duration is strictly greater than

LeetCode Problem 1501

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

This is a SQL database problem where we need to identify countries whose average call duration is strictly greater than the global average call duration.

The database consists of three tables:

  • The Person table stores information about people, including their unique id and phone_number.
  • The Country table maps a country_code to a country name.
  • The Calls table stores phone calls, where each row contains the caller, callee, and call duration.

The tricky part of the problem is understanding how a call contributes to a country’s statistics.

A call belongs to a country if either the caller or callee is from that country. In other words, each call contributes to the average duration of both participating countries. For example, if a Moroccan user calls an Israeli user, the duration contributes once to Morocco and once to Israel.

The phone number format gives us the country code:

xxx-yyyyyyy

The first three digits (xxx) represent the country code. We must match this prefix with Country.country_code to determine the country of a person.

The expected output is a table containing one column:

country

This column should include every country whose average call duration is strictly greater than the global average duration across all calls.

An important detail is that the global average is computed across all call records in the Calls table, while each country's average includes every call involving people from that country.

Because this is a database problem, the real challenge is expressing the logic cleanly in SQL rather than optimizing for large algorithmic constraints. We want to avoid unnecessarily complex joins and instead structure the query so that country level aggregation becomes straightforward.

Several edge cases are important:

  • A country may have no calls at all. Such countries should not appear because there is no country average to compare.
  • Multiple people may belong to the same country, and their calls must all be aggregated together.
  • Duplicate rows can exist in Calls, meaning we must treat every row independently rather than deduplicating.
  • Calls count toward both participants’ countries, so we cannot only consider the caller side or callee side.

Approaches

Brute Force Approach

A brute force solution would manually determine the country for every person and then iterate through all calls. For each call, we would identify both participants' countries, append the duration into per-country collections, and later compute averages.

Conceptually, the process would work like this:

  1. Extract each person's country code from their phone number.
  2. Match the code with the Country table.
  3. Iterate through every call.
  4. Add the call duration to both the caller’s and callee’s country.
  5. Compute each country average.
  6. Compare each country average with the global average.

This approach is correct because every call is processed exactly once and contributes to all relevant countries. However, implementing this procedurally in SQL would be awkward and inefficient. SQL is designed for set-based aggregation, so manually simulating loops is unnecessary.

Key Insight for the Optimal Approach

The key observation is that every call should be represented twice, once for the caller's country and once for the callee's country.

Instead of processing calls manually, we can transform the data into a unified table of:

(country, duration)

We can do this by:

  • Joining Calls.caller_id with Person.id
  • Joining Calls.callee_id with Person.id
  • Extracting the first three digits of the phone number to determine the country
  • Combining both caller and callee contributions using UNION ALL

Once we have a flattened dataset, computing country averages becomes a simple GROUP BY.

The global average can be computed independently using:

AVG(duration)

Then we simply filter countries whose average exceeds the global average.

Approach Time Complexity Space Complexity Notes
Brute Force O(P + C) O(P + C) Simulates mapping and aggregation manually
Optimal O(P + C) O(C) Uses SQL joins and aggregation efficiently

Here:

  • P = number of people
  • C = number of call records

Algorithm Walkthrough

  1. First, compute a mapping from each person to their country.

We extract the first three characters of phone_number using LEFT(phone_number, 3) and match them with Country.country_code. 2. Next, create a flattened call dataset.

For every call, create one record for the caller’s country and one for the callee’s country. This ensures that every country participating in a call gets credit for the duration. 3. Combine caller and callee contributions using UNION ALL.

We use UNION ALL instead of UNION because duplicate calls must be preserved. The problem explicitly states that duplicate rows may exist. 4. Group the flattened dataset by country.

Compute:

AVG(duration)

for every country. 5. Compute the global average.

This is simply:

SELECT AVG(duration)
FROM Calls
  1. Filter countries.

Keep only countries where:

country_avg > global_avg
  1. Return the resulting country names.

Why it works

The algorithm works because every call is counted exactly once for every participating country. By duplicating each call into caller and callee contributions, we ensure that country level averages match the problem definition. Since the global average is computed directly from the original calls table, the comparison is mathematically accurate.

Python Solution

LeetCode Database problems expect SQL queries rather than Python code. Below is the complete MySQL solution.

# This problem is SQL-only on LeetCode.
# Python implementation is not applicable.

The problem belongs to the Database category, meaning LeetCode expects a SQL query instead of a Python function. The logic is implemented using joins, aggregation, and filtering.

MySQL Solution

SELECT
    country
FROM (
    SELECT
        c.name AS country,
        AVG(t.duration) AS avg_duration
    FROM (
        SELECT
            caller_id AS person_id,
            duration
        FROM Calls

        UNION ALL

        SELECT
            callee_id AS person_id,
            duration
        FROM Calls
    ) t
    JOIN Person p
        ON t.person_id = p.id
    JOIN Country c
        ON LEFT(p.phone_number, 3) = c.country_code
    GROUP BY c.name
) country_avg
WHERE avg_duration > (
    SELECT AVG(duration)
    FROM Calls
);

The query begins by flattening the calls table into a participant based representation. Every call is represented twice, once for the caller and once for the callee, using UNION ALL.

Afterward, we join each participant with the Person table to obtain the phone number and extract the country code. That country code is matched against the Country table.

Once we have (country, duration) pairs, grouping by country allows us to compute country specific average call durations.

Finally, a subquery computes the global average duration, and we filter countries whose averages are strictly greater.

Go Solution

Like the Python version, Go is not applicable because this is a SQL database problem on LeetCode.

// This problem is SQL-only on LeetCode.

There are no Go specific implementation details because LeetCode expects a SQL query submission for Database problems.

Worked Examples

Example 1

We first map each person to a country.

Person ID Phone Number Country Code Country
3 051-1234567 051 Peru
12 051-7654321 051 Peru
1 212-1234567 212 Morocco
2 212-6523651 212 Morocco
7 972-1234567 972 Israel
9 972-0011100 972 Israel

We then expand every call into caller and callee contributions.

Call Duration Countries Receiving Credit
1 → 9 33 Morocco, Israel
2 → 9 4 Morocco, Israel
1 → 2 59 Morocco, Morocco
3 → 12 102 Peru, Peru
3 → 12 330 Peru, Peru
12 → 3 5 Peru, Peru
7 → 9 13 Israel, Israel
7 → 1 3 Israel, Morocco
9 → 7 1 Israel, Israel
1 → 7 7 Morocco, Israel

Country averages become:

Country Durations Average
Peru 102, 102, 330, 330, 5, 5 145.67
Israel 33, 4, 13, 13, 3, 1, 1, 7 9.38
Morocco 33, 4, 59, 59, 3, 7 27.50

The global average:

(33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7) / 10
= 55.7

Comparing:

Country Country Avg Greater Than Global?
Peru 145.67 Yes
Israel 9.38 No
Morocco 27.50 No

Final output:

Peru

Complexity Analysis

Measure Complexity Explanation
Time O(P + C) We process persons and calls through joins and aggregation
Space O(C) The flattened participant table stores up to twice the calls

The dominant work comes from processing call records and joining them with person and country information. Since each call is expanded into two rows, the intermediate dataset contains at most 2C rows, which still simplifies to O(C) space.

Test Cases

Since this is a SQL problem, test cases are best expressed as database scenarios.

# Example 1
# Expected: ["Peru"]

# Single country above average
# Country A average = 100
# Global average = 50
# Expected: ["Country A"]

# No country above global average
# All countries share equal average
# Expected: []

# Country with no calls
# Should not appear in result

# Duplicate call rows
# Duplicates must count independently

# Same-country calls
# Call contributes twice to the same country
Test Why
Problem example Validates correctness against expected output
Single country above average Ensures comparison logic works
Equal averages Verifies strict inequality
Country with no calls Confirms empty countries are ignored
Duplicate rows Ensures UNION ALL behavior is correct
Same-country calls Verifies double contribution to one country

Edge Cases

Countries With No Calls

Some countries in the Country table may never participate in any call. A naive solution might accidentally include them with a NULL average or divide by zero. Our implementation naturally excludes them because only countries appearing in the joined call dataset are grouped.

Duplicate Call Records

The Calls table may contain duplicate rows. Using UNION instead of UNION ALL would incorrectly remove duplicates and distort averages. The implementation explicitly uses UNION ALL, ensuring every row contributes independently.

Calls Within the Same Country

If both caller and callee belong to the same country, the duration must count twice for that country because both participants are from that country. A common bug is counting it only once. Since we separately expand callers and callees, the query correctly contributes both entries.

Strictly Greater Comparison

The requirement says the country average must be strictly greater than the global average. Using >= would incorrectly include countries whose averages exactly match the global average. The implementation correctly uses the > operator.