LeetCode 618 - Students Report By Geography
The problem asks us to transform a table of student names and their continents into a pivoted report, where each continent becomes a column and the student names appear alphabetically under their respective continents.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem asks us to transform a table of student names and their continents into a pivoted report, where each continent becomes a column and the student names appear alphabetically under their respective continents. The input is a Student table containing two columns: name and continent. This table may include duplicate entries, which should not affect the pivot result except that each row represents a student.
The expected output is a table with three columns: America, Asia, and Europe. Under each column, the names of students from that continent are listed in alphabetical order. Rows are aligned such that if one continent has more students than another, the shorter columns have NULL values for missing rows.
Constraints tell us that America will have at least as many students as the other continents, which simplifies row alignment, but the follow-up asks for a more general solution where continent counts are unknown. Edge cases include empty continents, duplicates, and varying numbers of students per continent.
Approaches
The brute-force approach is to manually query each continent, sort the students alphabetically, and then join the results row by row. While this works logically, it requires multiple self-joins or manual indexing to align the rows, which is cumbersome and difficult to scale when continent counts are unknown.
The optimal solution uses window functions with ROW_NUMBER() to assign a sequential index to each student per continent, sorted alphabetically. After indexing, a full outer join (or left joins if one continent is guaranteed to have the most students) aligns students by their row numbers. This guarantees correct alphabetical order and row alignment, while easily extending to unknown student counts for each continent.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) per continent + O(n) joins | O(n) | Sort and manually align, cumbersome with unknown continent sizes |
| Optimal | O(n log n) | O(n) | Use ROW_NUMBER() and pivot join for dynamic alignment |
Algorithm Walkthrough
- Use a
ROW_NUMBER()window function to assign an index to each student within their continent group, sorted alphabetically. This gives a unique row number for each student per continent. - Separate the students into three derived tables (or CTEs) by continent. Each table will have the student's name and their row index.
- Perform a full outer join on the row index across all three continent tables. This aligns students in the same row index together.
- Select the
namecolumns from each continent table and alias them asAmerica,Asia, andEurope. - Order the final result by the row index to maintain alignment.
Why it works: The ROW_NUMBER() guarantees that students are sorted alphabetically within their continent. Joining on the row index ensures that the first student in each continent aligns in the first row, the second student in the second row, and so on. Missing values are automatically filled with NULL during the join.
Python Solution
class Solution:
def pivotStudents(self, conn) -> None:
"""
Execute SQL on the given connection to pivot students by continent.
"""
query = """
WITH ranked AS (
SELECT
name,
continent,
ROW_NUMBER() OVER (PARTITION BY continent ORDER BY name) AS rn
FROM Student
)
SELECT
a.name AS America,
b.name AS Asia,
c.name AS Europe
FROM
(SELECT name, rn FROM ranked WHERE continent = 'America') a
LEFT JOIN
(SELECT name, rn FROM ranked WHERE continent = 'Asia') b
ON a.rn = b.rn
LEFT JOIN
(SELECT name, rn FROM ranked WHERE continent = 'Europe') c
ON a.rn = c.rn
ORDER BY a.rn;
"""
with conn.cursor() as cursor:
cursor.execute(query)
result = cursor.fetchall()
for row in result:
print(row)
In this Python implementation, a Common Table Expression (CTE) ranked is used to assign row numbers. Separate subqueries for each continent are left-joined on the row number. The ORDER BY a.rn ensures that the rows are correctly aligned.
Go Solution
package main
import (
"database/sql"
"fmt"
_ "github.com/go-sql-driver/mysql"
)
func PivotStudents(db *sql.DB) error {
query := `
WITH ranked AS (
SELECT
name,
continent,
ROW_NUMBER() OVER (PARTITION BY continent ORDER BY name) AS rn
FROM Student
)
SELECT
a.name AS America,
b.name AS Asia,
c.name AS Europe
FROM
(SELECT name, rn FROM ranked WHERE continent = 'America') a
LEFT JOIN
(SELECT name, rn FROM ranked WHERE continent = 'Asia') b
ON a.rn = b.rn
LEFT JOIN
(SELECT name, rn FROM ranked WHERE continent = 'Europe') c
ON a.rn = c.rn
ORDER BY a.rn;
`
rows, err := db.Query(query)
if err != nil {
return err
}
defer rows.Close()
for rows.Next() {
var america, asia, europe sql.NullString
if err := rows.Scan(&america, &asia, &europe); err != nil {
return err
}
fmt.Println(america.String, asia.String, europe.String)
}
return rows.Err()
}
In Go, sql.NullString is used to safely handle NULL values. The logic mirrors Python, and the WITH CTE combined with LEFT JOIN aligns rows correctly.
Worked Examples
Input Table
| name | continent |
|---|---|
| Jane | America |
| Pascal | Europe |
| Xi | Asia |
| Jack | America |
Step 1: Assign row numbers
| name | continent | rn |
|---|---|---|
| Jack | America | 1 |
| Jane | America | 2 |
| Xi | Asia | 1 |
| Pascal | Europe | 1 |
Step 2: Separate by continent
America: (Jack, 1), (Jane, 2)
Asia: (Xi, 1)
Europe: (Pascal, 1)
Step 3: Join on rn
| rn | America | Asia | Europe |
|---|---|---|---|
| 1 | Jack | Xi | Pascal |
| 2 | Jane | NULL | NULL |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Sorting students alphabetically per continent dominates |
| Space | O(n) | Storing row numbers and intermediate tables per continent |
The CTE with ROW_NUMBER() and joins requires memory proportional to the number of students, and the sort operation within each partition is O(n log n).
Test Cases
# Provided example
assert pivotStudents([("Jane","America"),("Pascal","Europe"),("Xi","Asia"),("Jack","America")]) == [("Jack","Xi","Pascal"),("Jane",None,None)]
# Empty table
assert pivotStudents([]) == []
# One student per continent
assert pivotStudents([("Alice","America"),("Bob","Asia"),("Charlie","Europe")]) == [("Alice","Bob","Charlie")]
# Multiple students, duplicates
assert pivotStudents([("Tom","America"),("Tom","America"),("Jerry","Asia")]) == [("Tom","Jerry",None),("Tom",None,None)]
| Test | Why |
|---|---|
| Provided example | Validates basic pivot with unequal lengths |
| Empty table | Ensures code handles no data |
| One student per continent | Validates minimal non-empty case |
| Multiple students, duplicates | Ensures duplicates are handled correctly |
Edge Cases
One edge case is when a continent has no students. Using LEFT JOIN or FULL OUTER JOIN ensures missing rows are returned as NULL without error.
Another edge case is duplicate student names. Since ROW_NUMBER() operates on the combination of continent and alphabetical order, duplicates are assigned distinct row numbers, preventing collisions.
A third edge case is when America does not have the most students, as allowed in the follow-up. The optimal solution can extend to a dynamic pivot using FULL OUTER JOIN and a row number max over all continents, ensuring correct alignment regardless of the largest continent.