LeetCode 2991 - Top Three Wineries

The problem asks us to determine the top three wineries in each country based on the total points accumulated by each winery. We are given a table called Wineries that contains columns id, country, points, and winery.

LeetCode Problem 2991

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem asks us to determine the top three wineries in each country based on the total points accumulated by each winery. We are given a table called Wineries that contains columns id, country, points, and winery. Each row represents a wine evaluation from a specific winery in a given country.

Our task is to first aggregate points by winery within each country because a single winery may appear multiple times with different points. After computing the total points for each winery, we need to rank the wineries in descending order of their total points. If two wineries have the same total points, the one with the lexicographically smaller name should come first. We then output the top three wineries for each country, but if a country has fewer than three wineries, we must output placeholder strings "No second winery" or "No third winery" as needed. Finally, the output table must be sorted by country in ascending order.

Important edge cases include countries with fewer than three wineries, wineries that tie in total points, and handling multiple rows for the same winery correctly by summing points. The problem guarantees unique IDs, but it does not guarantee unique winery names or country names.

Approaches

A brute-force approach would involve manually iterating over each country, summing points for each winery, sorting them, and then manually selecting the top three. While correct, this method is tedious and inefficient for large datasets.

The optimal approach leverages SQL window functions or grouping and aggregation features. By first summing the points for each winery per country, we can rank them using ROW_NUMBER or RANK to identify the top three. Conditional aggregation or a CASE statement allows us to pivot the top three into separate columns with appropriate fallbacks for missing entries. This method scales well and is concise.

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) per country O(n) Manually group and sort each country, aggregate points, then select top 3
Optimal O(n log n) O(n) Use SQL aggregation, ranking, and pivoting for efficiency

Algorithm Walkthrough

  1. Aggregate Points by Winery: Group the Wineries table by country and winery, summing the points to get total points for each winery.
  2. Rank Wineries per Country: Within each country, rank the wineries in descending order of total points, breaking ties with lexicographic order of winery name. Use a ranking function like ROW_NUMBER to assign positions 1, 2, 3, etc.
  3. Pivot Top 3 to Columns: For each country, select the top 3 ranked wineries and transform their names and total points into separate columns: top_winery, second_winery, third_winery. Use placeholders if fewer than three wineries exist.
  4. Sort by Country: Ensure the final output is sorted by country ascending.

Why it works: Aggregating first ensures correct point totals per winery. Ranking ensures proper order even in case of ties. Pivoting and placeholders satisfy the output format requirement. Sorting ensures the final presentation aligns with problem specifications.

Python Solution

import pandas as pd

def top_three_wineries(wineries: pd.DataFrame) -> pd.DataFrame:
    # Step 1: Aggregate total points per winery per country
    agg = wineries.groupby(['country', 'winery'], as_index=False)['points'].sum()
    
    # Step 2: Sort within each country by points desc, winery asc
    agg = agg.sort_values(by=['country', 'points', 'winery'], ascending=[True, False, True])
    
    # Step 3: Assign rank per country
    agg['rank'] = agg.groupby('country').cumcount() + 1
    
    # Step 4: Pivot top three into separate columns
    def pivot_rows(df):
        result = {'country': df['country'].iloc[0]}
        top_wineries = df.set_index('rank').to_dict('index')
        result['top_winery'] = f"{top_wineries[1]['winery']} ({top_wineries[1]['points']})" if 1 in top_wineries else "No top winery"
        result['second_winery'] = f"{top_wineries[2]['winery']} ({top_wineries[2]['points']})" if 2 in top_wineries else "No second winery"
        result['third_winery'] = f"{top_wineries[3]['winery']} ({top_wineries[3]['points']})" if 3 in top_wineries else "No third winery"
        return pd.DataFrame([result])
    
    final = pd.concat([pivot_rows(g) for _, g in agg.groupby('country')], ignore_index=True)
    final = final.sort_values('country').reset_index(drop=True)
    return final

This implementation first groups by country and winery to sum points, then sorts each group by total points and winery name. The cumcount() provides the ranking. The pivoting step constructs the final top-three columns, filling placeholders as needed. Finally, sorting by country ensures the correct output order.

Go Solution

package main

import (
    "fmt"
    "sort"
)

type Winery struct {
    ID      int
    Country string
    Points  int
    Name    string
}

type TopWineries struct {
    Country       string
    TopWinery     string
    SecondWinery  string
    ThirdWinery   string
}

func TopThreeWineries(wineries []Winery) []TopWineries {
    type aggWinery struct {
        Name   string
        Points int
    }
    countryMap := make(map[string]map[string]int)
    
    for _, w := range wineries {
        if _, ok := countryMap[w.Country]; !ok {
            countryMap[w.Country] = make(map[string]int)
        }
        countryMap[w.Country][w.Name] += w.Points
    }
    
    var result []TopWineries
    for country, wm := range countryMap {
        agg := []aggWinery{}
        for name, points := range wm {
            agg = append(agg, aggWinery{Name: name, Points: points})
        }
        sort.Slice(agg, func(i, j int) bool {
            if agg[i].Points == agg[j].Points {
                return agg[i].Name < agg[j].Name
            }
            return agg[i].Points > agg[j].Points
        })
        
        top, second, third := "No top winery", "No second winery", "No third winery"
        if len(agg) > 0 {
            top = fmt.Sprintf("%s (%d)", agg[0].Name, agg[0].Points)
        }
        if len(agg) > 1 {
            second = fmt.Sprintf("%s (%d)", agg[1].Name, agg[1].Points)
        }
        if len(agg) > 2 {
            third = fmt.Sprintf("%s (%d)", agg[2].Name, agg[2].Points)
        }
        
        result = append(result, TopWineries{
            Country:      country,
            TopWinery:    top,
            SecondWinery: second,
            ThirdWinery:  third,
        })
    }
    
    sort.Slice(result, func(i, j int) bool {
        return result[i].Country < result[j].Country
    })
    
    return result
}

In Go, we aggregate winery points in nested maps, sort wineries by points and name, and then populate the top three results with placeholders if needed. Go requires explicit handling of slices and string formatting compared to Python.

Worked Examples

For the Australia case:

  1. Aggregate points: HarmonyHill 100, GrapesGalore 85, WhisperingPines 84
  2. Sort descending: HarmonyHill, GrapesGalore, WhisperingPines
  3. Assign rank: 1, 2, 3
  4. Pivot: top_winery = HarmonyHill (100), second_winery = GrapesGalore (85), third_winery = WhisperingPines (84)

For Hungary:

  1. Aggregate: MoonlitCellars 60
  2. Only one winery exists
  3. Pivot: top_winery = MoonlitCellars (60), second_winery = No second winery, third_winery = No third winery

For USA:

  1. Aggregate: RoyalVines 47+39=86, Eagle'sNest 45, PacificCrest 9
  2. Sort: RoyalVines, Eagle'sNest, PacificCrest
  3. Pivot: top_winery = RoyalVines (86), second_winery = Eagle'sNest (45), third_winery = PacificCrest (9)

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Aggregation is O(n), sorting per country is O(k log k), overall dominated by sort
Space O(n) Storing aggregated winery points and intermediate ranking data

The complexity reasoning is straightforward: we process each row once for aggregation and sort wineries per country. Space is linear in the number of unique winery-country pairs.

Test Cases

import pandas as pd

wineries = pd.DataFrame([
    [103, 'Australia', 84, 'WhisperingPines'],
    [737, 'Australia', 85, 'GrapesGalore'],
    [848, 'Australia', 100, 'HarmonyHill'],
    [222, 'Hungary', 60, 'Moonlit