LeetCode 2880 - Select Data

Difficulty: 🟢 Easy
Topics: —

Solution

Problem Understanding

This problem provides a pandas DataFrame named students with three columns:

Column	Description
`student_id`	Unique identifier for a student
`name`	Student name
`age`	Student age

The task is to return only the name and age columns for the student whose student_id is equal to 101.

In other words, we need to filter the table so that only rows matching student_id == 101 remain, then select only the relevant columns from those rows.

The input is a pandas DataFrame, not a traditional array or list. That means the intended solution is based on pandas operations such as filtering and column selection.

The output must also be a DataFrame, not a scalar value or dictionary. Even if there is only one matching row, the result should preserve the tabular structure.

The problem guarantees that the table follows the specified schema. Since this is an Easy pandas problem, the expected solution is straightforward and efficient using built in DataFrame operations.

One important detail is that the result must contain only the name and age columns. Returning all columns would produce the wrong output format.

Another important edge case is the possibility of multiple rows with student_id = 101. Although the examples imply uniqueness, the filtering operation naturally handles multiple matches correctly by returning all matching rows.

If no student with ID 101 exists, pandas filtering would simply return an empty DataFrame with the selected columns.

Approaches

Brute Force Approach

A brute force solution would manually iterate through every row in the DataFrame, check whether the student_id equals 101, and then collect the corresponding name and age values into a new structure.

Conceptually, the algorithm would work like this:

Traverse every row one by one.
Check whether student_id == 101.
If it matches, store the name and age.
Construct a new DataFrame from the collected results.

This approach is correct because every row is examined, so no valid match can be missed.

However, this solution is unnecessarily verbose and inefficient for pandas problems. Iterating row by row defeats the purpose of vectorized DataFrame operations, which are optimized internally in C and are significantly faster and cleaner.

Optimal Approach

The optimal solution uses pandas boolean indexing.

The key insight is that pandas allows us to filter rows directly using a boolean condition:

students["student_id"] == 101

This produces a boolean mask where matching rows are marked as True.

We can then use this mask to filter the DataFrame and select only the desired columns:

students.loc[mask, ["name", "age"]]

This approach is concise, efficient, and idiomatic pandas code.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(k)	Iterates manually through every row
Optimal	O(n)	O(k)	Uses vectorized pandas filtering

Here, n is the number of rows in the DataFrame, and k is the number of matching rows.

Algorithm Walkthrough

Start with the students DataFrame containing all student records.
Create a boolean condition that checks whether each row has student_id == 101.

students["student_id"] == 101

This produces a boolean Series such as:

[True, False, False, False]

Use .loc[] to filter the DataFrame using the boolean mask.

students.loc[mask]

This keeps only rows where the condition is True. 4. Select only the name and age columns.

students.loc[mask, ["name", "age"]]

Return the resulting DataFrame.

Why it works

The algorithm works because boolean indexing in pandas preserves exactly those rows where the condition evaluates to True. Since the condition specifically checks for student_id == 101, every returned row must belong to the target student. Selecting the name and age columns afterward guarantees the output format matches the problem requirements.

Python Solution

import pandas as pd

def selectData(students: pd.DataFrame) -> pd.DataFrame:
    return students.loc[students["student_id"] == 101, ["name", "age"]]

The implementation is intentionally concise because pandas provides powerful built in filtering operations.

The expression:

students["student_id"] == 101

creates a boolean mask that identifies all matching rows.

The .loc[] accessor is then used for two operations simultaneously:

Filtering rows using the boolean mask.
Selecting only the name and age columns.

The returned value is still a DataFrame, which matches the expected LeetCode output format.

Go Solution

LeetCode problem 2880 is a pandas specific problem, so Go is not officially supported on the platform for this question. However, the equivalent logic in Go can be demonstrated using structs and slices.

package main

import "fmt"

type Student struct {
	StudentID int
	Name      string
	Age       int
}

type Result struct {
	Name string
	Age  int
}

func selectData(students []Student) []Result {
	var result []Result

	for _, student := range students {
		if student.StudentID == 101 {
			result = append(result, Result{
				Name: student.Name,
				Age:  student.Age,
			})
		}
	}

	return result
}

func main() {
	students := []Student{
		{101, "Ulysses", 13},
		{53, "William", 10},
		{128, "Henry", 6},
		{3, "Henry", 11},
	}

	fmt.Println(selectData(students))
}

The Go version manually iterates through the slice because Go does not provide built in DataFrame operations like pandas.

Instead of boolean indexing, we use a standard loop and conditional filtering. The logic remains identical: check whether StudentID == 101, then collect the corresponding Name and Age.

Unlike pandas, Go requires explicit struct definitions for both the input and output data.

Worked Examples

Example 1

Input:

student_id	name	age
101	Ulysses	13
53	William	10
128	Henry	6
3	Henry	11

Step 1: Create Boolean Mask

We evaluate:

students["student_id"] == 101

Result:

Row	student_id	Matches?
0	101	True
1	53	False
2	128	False
3	3	False

Boolean mask:

[True, False, False, False]

Step 2: Filter Rows

Applying the mask keeps only rows where the value is True.

Intermediate result:

student_id	name	age
101	Ulysses	13

Step 3: Select Columns

We keep only name and age.

Final output:

name	age
Ulysses	13

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Every row is checked once against the condition
Space	O(k)	Output stores only matching rows

The algorithm scans the student_id column once to build the boolean mask, so the runtime is linear in the number of rows.

The additional space depends on how many rows match the condition. If k rows match, the resulting DataFrame stores those k rows.

Test Cases

import pandas as pd

def selectData(students: pd.DataFrame) -> pd.DataFrame:
    return students.loc[students["student_id"] == 101, ["name", "age"]]

# Example case
students = pd.DataFrame({
    "student_id": [101, 53, 128, 3],
    "name": ["Ulysses", "William", "Henry", "Henry"],
    "age": [13, 10, 6, 11]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Ulysses"],
    "age": [13]
})

assert result.reset_index(drop=True).equals(expected)  # Standard example

# Multiple matching students
students = pd.DataFrame({
    "student_id": [101, 101],
    "name": ["Alice", "Bob"],
    "age": [15, 16]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [15, 16]
})

assert result.reset_index(drop=True).equals(expected)  # Multiple matches

# No matching student
students = pd.DataFrame({
    "student_id": [1, 2, 3],
    "name": ["A", "B", "C"],
    "age": [10, 11, 12]
})

result = selectData(students)

expected = pd.DataFrame(columns=["name", "age"])

assert result.reset_index(drop=True).equals(expected)  # Empty result

# Single row table with matching student
students = pd.DataFrame({
    "student_id": [101],
    "name": ["Solo"],
    "age": [20]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Solo"],
    "age": [20]
})

assert result.reset_index(drop=True).equals(expected)  # Single matching row

# Single row table without matching student
students = pd.DataFrame({
    "student_id": [999],
    "name": ["Nobody"],
    "age": [30]
})

result = selectData(students)

expected = pd.DataFrame(columns=["name", "age"])

assert result.reset_index(drop=True).equals(expected)  # Single non-matching row

Test	Why
Standard example	Verifies basic filtering functionality
Multiple matching students	Ensures all matches are returned
No matching student	Confirms empty DataFrame handling
Single matching row	Tests smallest valid matching input
Single non matching row	Tests smallest non matching input

Edge Cases

No Student With ID 101

A common bug is assuming that a matching row always exists. If no row has student_id = 101, some implementations may raise errors or return incorrect structures.

This implementation handles the case naturally through pandas filtering. The boolean mask simply contains all False values, and the result becomes an empty DataFrame with the correct columns.

Multiple Students With ID 101

Although the example suggests IDs may be unique, the problem statement does not explicitly guarantee uniqueness.

A naive implementation might stop after finding the first match. However, the pandas filtering approach automatically returns every matching row, making the solution more robust.

Preserving DataFrame Structure

Another subtle issue is accidentally returning a Series instead of a DataFrame.

For example:

students[students["student_id"] == 101]["name"]

would return only a Series.

The implementation avoids this by explicitly selecting both columns using:

[["name", "age"]]

which guarantees the return type remains a DataFrame.