LeetCode 2991 - Top Three Wineries
The problem asks us to determine the top three wineries in each country based on the total points accumulated by each winery. We are given a table called Wineries that contains columns id, country, points, and winery.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem asks us to determine the top three wineries in each country based on the total points accumulated by each winery. We are given a table called Wineries that contains columns id, country, points, and winery. Each row represents a wine evaluation from a specific winery in a given country.
Our task is to first aggregate points by winery within each country because a single winery may appear multiple times with different points. After computing the total points for each winery, we need to rank the wineries in descending order of their total points. If two wineries have the same total points, the one with the lexicographically smaller name should come first. We then output the top three wineries for each country, but if a country has fewer than three wineries, we must output placeholder strings "No second winery" or "No third winery" as needed. Finally, the output table must be sorted by country in ascending order.
Important edge cases include countries with fewer than three wineries, wineries that tie in total points, and handling multiple rows for the same winery correctly by summing points. The problem guarantees unique IDs, but it does not guarantee unique winery names or country names.
Approaches
A brute-force approach would involve manually iterating over each country, summing points for each winery, sorting them, and then manually selecting the top three. While correct, this method is tedious and inefficient for large datasets.
The optimal approach leverages SQL window functions or grouping and aggregation features. By first summing the points for each winery per country, we can rank them using ROW_NUMBER or RANK to identify the top three. Conditional aggregation or a CASE statement allows us to pivot the top three into separate columns with appropriate fallbacks for missing entries. This method scales well and is concise.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) per country | O(n) | Manually group and sort each country, aggregate points, then select top 3 |
| Optimal | O(n log n) | O(n) | Use SQL aggregation, ranking, and pivoting for efficiency |
Algorithm Walkthrough
- Aggregate Points by Winery: Group the
Wineriestable bycountryandwinery, summing thepointsto get total points for each winery. - Rank Wineries per Country: Within each country, rank the wineries in descending order of total points, breaking ties with lexicographic order of winery name. Use a ranking function like
ROW_NUMBERto assign positions 1, 2, 3, etc. - Pivot Top 3 to Columns: For each country, select the top 3 ranked wineries and transform their names and total points into separate columns:
top_winery,second_winery,third_winery. Use placeholders if fewer than three wineries exist. - Sort by Country: Ensure the final output is sorted by
countryascending.
Why it works: Aggregating first ensures correct point totals per winery. Ranking ensures proper order even in case of ties. Pivoting and placeholders satisfy the output format requirement. Sorting ensures the final presentation aligns with problem specifications.
Python Solution
import pandas as pd
def top_three_wineries(wineries: pd.DataFrame) -> pd.DataFrame:
# Step 1: Aggregate total points per winery per country
agg = wineries.groupby(['country', 'winery'], as_index=False)['points'].sum()
# Step 2: Sort within each country by points desc, winery asc
agg = agg.sort_values(by=['country', 'points', 'winery'], ascending=[True, False, True])
# Step 3: Assign rank per country
agg['rank'] = agg.groupby('country').cumcount() + 1
# Step 4: Pivot top three into separate columns
def pivot_rows(df):
result = {'country': df['country'].iloc[0]}
top_wineries = df.set_index('rank').to_dict('index')
result['top_winery'] = f"{top_wineries[1]['winery']} ({top_wineries[1]['points']})" if 1 in top_wineries else "No top winery"
result['second_winery'] = f"{top_wineries[2]['winery']} ({top_wineries[2]['points']})" if 2 in top_wineries else "No second winery"
result['third_winery'] = f"{top_wineries[3]['winery']} ({top_wineries[3]['points']})" if 3 in top_wineries else "No third winery"
return pd.DataFrame([result])
final = pd.concat([pivot_rows(g) for _, g in agg.groupby('country')], ignore_index=True)
final = final.sort_values('country').reset_index(drop=True)
return final
This implementation first groups by country and winery to sum points, then sorts each group by total points and winery name. The cumcount() provides the ranking. The pivoting step constructs the final top-three columns, filling placeholders as needed. Finally, sorting by country ensures the correct output order.
Go Solution
package main
import (
"fmt"
"sort"
)
type Winery struct {
ID int
Country string
Points int
Name string
}
type TopWineries struct {
Country string
TopWinery string
SecondWinery string
ThirdWinery string
}
func TopThreeWineries(wineries []Winery) []TopWineries {
type aggWinery struct {
Name string
Points int
}
countryMap := make(map[string]map[string]int)
for _, w := range wineries {
if _, ok := countryMap[w.Country]; !ok {
countryMap[w.Country] = make(map[string]int)
}
countryMap[w.Country][w.Name] += w.Points
}
var result []TopWineries
for country, wm := range countryMap {
agg := []aggWinery{}
for name, points := range wm {
agg = append(agg, aggWinery{Name: name, Points: points})
}
sort.Slice(agg, func(i, j int) bool {
if agg[i].Points == agg[j].Points {
return agg[i].Name < agg[j].Name
}
return agg[i].Points > agg[j].Points
})
top, second, third := "No top winery", "No second winery", "No third winery"
if len(agg) > 0 {
top = fmt.Sprintf("%s (%d)", agg[0].Name, agg[0].Points)
}
if len(agg) > 1 {
second = fmt.Sprintf("%s (%d)", agg[1].Name, agg[1].Points)
}
if len(agg) > 2 {
third = fmt.Sprintf("%s (%d)", agg[2].Name, agg[2].Points)
}
result = append(result, TopWineries{
Country: country,
TopWinery: top,
SecondWinery: second,
ThirdWinery: third,
})
}
sort.Slice(result, func(i, j int) bool {
return result[i].Country < result[j].Country
})
return result
}
In Go, we aggregate winery points in nested maps, sort wineries by points and name, and then populate the top three results with placeholders if needed. Go requires explicit handling of slices and string formatting compared to Python.
Worked Examples
For the Australia case:
- Aggregate points: HarmonyHill 100, GrapesGalore 85, WhisperingPines 84
- Sort descending: HarmonyHill, GrapesGalore, WhisperingPines
- Assign rank: 1, 2, 3
- Pivot: top_winery = HarmonyHill (100), second_winery = GrapesGalore (85), third_winery = WhisperingPines (84)
For Hungary:
- Aggregate: MoonlitCellars 60
- Only one winery exists
- Pivot: top_winery = MoonlitCellars (60), second_winery = No second winery, third_winery = No third winery
For USA:
- Aggregate: RoyalVines 47+39=86, Eagle'sNest 45, PacificCrest 9
- Sort: RoyalVines, Eagle'sNest, PacificCrest
- Pivot: top_winery = RoyalVines (86), second_winery = Eagle'sNest (45), third_winery = PacificCrest (9)
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Aggregation is O(n), sorting per country is O(k log k), overall dominated by sort |
| Space | O(n) | Storing aggregated winery points and intermediate ranking data |
The complexity reasoning is straightforward: we process each row once for aggregation and sort wineries per country. Space is linear in the number of unique winery-country pairs.
Test Cases
import pandas as pd
wineries = pd.DataFrame([
[103, 'Australia', 84, 'WhisperingPines'],
[737, 'Australia', 85, 'GrapesGalore'],
[848, 'Australia', 100, 'HarmonyHill'],
[222, 'Hungary', 60, 'Moonlit