LeetCode 1501 - Countries You Can Safely Invest In
This is a SQL database problem where we need to identify countries whose average call duration is strictly greater than
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
This is a SQL database problem where we need to identify countries whose average call duration is strictly greater than the global average call duration.
The database consists of three tables:
- The
Persontable stores information about people, including their uniqueidandphone_number. - The
Countrytable maps acountry_codeto a country name. - The
Callstable stores phone calls, where each row contains the caller, callee, and call duration.
The tricky part of the problem is understanding how a call contributes to a country’s statistics.
A call belongs to a country if either the caller or callee is from that country. In other words, each call contributes to the average duration of both participating countries. For example, if a Moroccan user calls an Israeli user, the duration contributes once to Morocco and once to Israel.
The phone number format gives us the country code:
xxx-yyyyyyy
The first three digits (xxx) represent the country code. We must match this prefix with Country.country_code to determine the country of a person.
The expected output is a table containing one column:
country
This column should include every country whose average call duration is strictly greater than the global average duration across all calls.
An important detail is that the global average is computed across all call records in the Calls table, while each country's average includes every call involving people from that country.
Because this is a database problem, the real challenge is expressing the logic cleanly in SQL rather than optimizing for large algorithmic constraints. We want to avoid unnecessarily complex joins and instead structure the query so that country level aggregation becomes straightforward.
Several edge cases are important:
- A country may have no calls at all. Such countries should not appear because there is no country average to compare.
- Multiple people may belong to the same country, and their calls must all be aggregated together.
- Duplicate rows can exist in
Calls, meaning we must treat every row independently rather than deduplicating. - Calls count toward both participants’ countries, so we cannot only consider the caller side or callee side.
Approaches
Brute Force Approach
A brute force solution would manually determine the country for every person and then iterate through all calls. For each call, we would identify both participants' countries, append the duration into per-country collections, and later compute averages.
Conceptually, the process would work like this:
- Extract each person's country code from their phone number.
- Match the code with the
Countrytable. - Iterate through every call.
- Add the call duration to both the caller’s and callee’s country.
- Compute each country average.
- Compare each country average with the global average.
This approach is correct because every call is processed exactly once and contributes to all relevant countries. However, implementing this procedurally in SQL would be awkward and inefficient. SQL is designed for set-based aggregation, so manually simulating loops is unnecessary.
Key Insight for the Optimal Approach
The key observation is that every call should be represented twice, once for the caller's country and once for the callee's country.
Instead of processing calls manually, we can transform the data into a unified table of:
(country, duration)
We can do this by:
- Joining
Calls.caller_idwithPerson.id - Joining
Calls.callee_idwithPerson.id - Extracting the first three digits of the phone number to determine the country
- Combining both caller and callee contributions using
UNION ALL
Once we have a flattened dataset, computing country averages becomes a simple GROUP BY.
The global average can be computed independently using:
AVG(duration)
Then we simply filter countries whose average exceeds the global average.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(P + C) | O(P + C) | Simulates mapping and aggregation manually |
| Optimal | O(P + C) | O(C) | Uses SQL joins and aggregation efficiently |
Here:
P= number of peopleC= number of call records
Algorithm Walkthrough
- First, compute a mapping from each person to their country.
We extract the first three characters of phone_number using LEFT(phone_number, 3) and match them with Country.country_code.
2. Next, create a flattened call dataset.
For every call, create one record for the caller’s country and one for the callee’s country. This ensures that every country participating in a call gets credit for the duration.
3. Combine caller and callee contributions using UNION ALL.
We use UNION ALL instead of UNION because duplicate calls must be preserved. The problem explicitly states that duplicate rows may exist.
4. Group the flattened dataset by country.
Compute:
AVG(duration)
for every country. 5. Compute the global average.
This is simply:
SELECT AVG(duration)
FROM Calls
- Filter countries.
Keep only countries where:
country_avg > global_avg
- Return the resulting country names.
Why it works
The algorithm works because every call is counted exactly once for every participating country. By duplicating each call into caller and callee contributions, we ensure that country level averages match the problem definition. Since the global average is computed directly from the original calls table, the comparison is mathematically accurate.
Python Solution
LeetCode Database problems expect SQL queries rather than Python code. Below is the complete MySQL solution.
# This problem is SQL-only on LeetCode.
# Python implementation is not applicable.
The problem belongs to the Database category, meaning LeetCode expects a SQL query instead of a Python function. The logic is implemented using joins, aggregation, and filtering.
MySQL Solution
SELECT
country
FROM (
SELECT
c.name AS country,
AVG(t.duration) AS avg_duration
FROM (
SELECT
caller_id AS person_id,
duration
FROM Calls
UNION ALL
SELECT
callee_id AS person_id,
duration
FROM Calls
) t
JOIN Person p
ON t.person_id = p.id
JOIN Country c
ON LEFT(p.phone_number, 3) = c.country_code
GROUP BY c.name
) country_avg
WHERE avg_duration > (
SELECT AVG(duration)
FROM Calls
);
The query begins by flattening the calls table into a participant based representation. Every call is represented twice, once for the caller and once for the callee, using UNION ALL.
Afterward, we join each participant with the Person table to obtain the phone number and extract the country code. That country code is matched against the Country table.
Once we have (country, duration) pairs, grouping by country allows us to compute country specific average call durations.
Finally, a subquery computes the global average duration, and we filter countries whose averages are strictly greater.
Go Solution
Like the Python version, Go is not applicable because this is a SQL database problem on LeetCode.
// This problem is SQL-only on LeetCode.
There are no Go specific implementation details because LeetCode expects a SQL query submission for Database problems.
Worked Examples
Example 1
We first map each person to a country.
| Person ID | Phone Number | Country Code | Country |
|---|---|---|---|
| 3 | 051-1234567 | 051 | Peru |
| 12 | 051-7654321 | 051 | Peru |
| 1 | 212-1234567 | 212 | Morocco |
| 2 | 212-6523651 | 212 | Morocco |
| 7 | 972-1234567 | 972 | Israel |
| 9 | 972-0011100 | 972 | Israel |
We then expand every call into caller and callee contributions.
| Call | Duration | Countries Receiving Credit |
|---|---|---|
| 1 → 9 | 33 | Morocco, Israel |
| 2 → 9 | 4 | Morocco, Israel |
| 1 → 2 | 59 | Morocco, Morocco |
| 3 → 12 | 102 | Peru, Peru |
| 3 → 12 | 330 | Peru, Peru |
| 12 → 3 | 5 | Peru, Peru |
| 7 → 9 | 13 | Israel, Israel |
| 7 → 1 | 3 | Israel, Morocco |
| 9 → 7 | 1 | Israel, Israel |
| 1 → 7 | 7 | Morocco, Israel |
Country averages become:
| Country | Durations | Average |
|---|---|---|
| Peru | 102, 102, 330, 330, 5, 5 | 145.67 |
| Israel | 33, 4, 13, 13, 3, 1, 1, 7 | 9.38 |
| Morocco | 33, 4, 59, 59, 3, 7 | 27.50 |
The global average:
(33 + 4 + 59 + 102 + 330 + 5 + 13 + 3 + 1 + 7) / 10
= 55.7
Comparing:
| Country | Country Avg | Greater Than Global? |
|---|---|---|
| Peru | 145.67 | Yes |
| Israel | 9.38 | No |
| Morocco | 27.50 | No |
Final output:
Peru
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(P + C) | We process persons and calls through joins and aggregation |
| Space | O(C) | The flattened participant table stores up to twice the calls |
The dominant work comes from processing call records and joining them with person and country information. Since each call is expanded into two rows, the intermediate dataset contains at most 2C rows, which still simplifies to O(C) space.
Test Cases
Since this is a SQL problem, test cases are best expressed as database scenarios.
# Example 1
# Expected: ["Peru"]
# Single country above average
# Country A average = 100
# Global average = 50
# Expected: ["Country A"]
# No country above global average
# All countries share equal average
# Expected: []
# Country with no calls
# Should not appear in result
# Duplicate call rows
# Duplicates must count independently
# Same-country calls
# Call contributes twice to the same country
| Test | Why |
|---|---|
| Problem example | Validates correctness against expected output |
| Single country above average | Ensures comparison logic works |
| Equal averages | Verifies strict inequality |
| Country with no calls | Confirms empty countries are ignored |
| Duplicate rows | Ensures UNION ALL behavior is correct |
| Same-country calls | Verifies double contribution to one country |
Edge Cases
Countries With No Calls
Some countries in the Country table may never participate in any call. A naive solution might accidentally include them with a NULL average or divide by zero. Our implementation naturally excludes them because only countries appearing in the joined call dataset are grouped.
Duplicate Call Records
The Calls table may contain duplicate rows. Using UNION instead of UNION ALL would incorrectly remove duplicates and distort averages. The implementation explicitly uses UNION ALL, ensuring every row contributes independently.
Calls Within the Same Country
If both caller and callee belong to the same country, the duration must count twice for that country because both participants are from that country. A common bug is counting it only once. Since we separately expand callers and callees, the query correctly contributes both entries.
Strictly Greater Comparison
The requirement says the country average must be strictly greater than the global average. Using >= would incorrectly include countries whose averages exactly match the global average. The implementation correctly uses the > operator.