LeetCode 2880 - Select Data
This problem provides a pandas DataFrame named students with three columns: | Column | Description | | --- | --- | | studentid | Unique identifier for a student | | name | Student name | | age | Student age | The task is to return only the name and age columns for the student…
Difficulty: 🟢 Easy
Topics: —
Solution
Problem Understanding
This problem provides a pandas DataFrame named students with three columns:
| Column | Description |
|---|---|
student_id |
Unique identifier for a student |
name |
Student name |
age |
Student age |
The task is to return only the name and age columns for the student whose student_id is equal to 101.
In other words, we need to filter the table so that only rows matching student_id == 101 remain, then select only the relevant columns from those rows.
The input is a pandas DataFrame, not a traditional array or list. That means the intended solution is based on pandas operations such as filtering and column selection.
The output must also be a DataFrame, not a scalar value or dictionary. Even if there is only one matching row, the result should preserve the tabular structure.
The problem guarantees that the table follows the specified schema. Since this is an Easy pandas problem, the expected solution is straightforward and efficient using built in DataFrame operations.
One important detail is that the result must contain only the name and age columns. Returning all columns would produce the wrong output format.
Another important edge case is the possibility of multiple rows with student_id = 101. Although the examples imply uniqueness, the filtering operation naturally handles multiple matches correctly by returning all matching rows.
If no student with ID 101 exists, pandas filtering would simply return an empty DataFrame with the selected columns.
Approaches
Brute Force Approach
A brute force solution would manually iterate through every row in the DataFrame, check whether the student_id equals 101, and then collect the corresponding name and age values into a new structure.
Conceptually, the algorithm would work like this:
- Traverse every row one by one.
- Check whether
student_id == 101. - If it matches, store the
nameandage. - Construct a new DataFrame from the collected results.
This approach is correct because every row is examined, so no valid match can be missed.
However, this solution is unnecessarily verbose and inefficient for pandas problems. Iterating row by row defeats the purpose of vectorized DataFrame operations, which are optimized internally in C and are significantly faster and cleaner.
Optimal Approach
The optimal solution uses pandas boolean indexing.
The key insight is that pandas allows us to filter rows directly using a boolean condition:
students["student_id"] == 101
This produces a boolean mask where matching rows are marked as True.
We can then use this mask to filter the DataFrame and select only the desired columns:
students.loc[mask, ["name", "age"]]
This approach is concise, efficient, and idiomatic pandas code.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(k) | Iterates manually through every row |
| Optimal | O(n) | O(k) | Uses vectorized pandas filtering |
Here, n is the number of rows in the DataFrame, and k is the number of matching rows.
Algorithm Walkthrough
- Start with the
studentsDataFrame containing all student records. - Create a boolean condition that checks whether each row has
student_id == 101.
students["student_id"] == 101
This produces a boolean Series such as:
[True, False, False, False]
- Use
.loc[]to filter the DataFrame using the boolean mask.
students.loc[mask]
This keeps only rows where the condition is True.
4. Select only the name and age columns.
students.loc[mask, ["name", "age"]]
- Return the resulting DataFrame.
Why it works
The algorithm works because boolean indexing in pandas preserves exactly those rows where the condition evaluates to True. Since the condition specifically checks for student_id == 101, every returned row must belong to the target student. Selecting the name and age columns afterward guarantees the output format matches the problem requirements.
Python Solution
import pandas as pd
def selectData(students: pd.DataFrame) -> pd.DataFrame:
return students.loc[students["student_id"] == 101, ["name", "age"]]
The implementation is intentionally concise because pandas provides powerful built in filtering operations.
The expression:
students["student_id"] == 101
creates a boolean mask that identifies all matching rows.
The .loc[] accessor is then used for two operations simultaneously:
- Filtering rows using the boolean mask.
- Selecting only the
nameandagecolumns.
The returned value is still a DataFrame, which matches the expected LeetCode output format.
Go Solution
LeetCode problem 2880 is a pandas specific problem, so Go is not officially supported on the platform for this question. However, the equivalent logic in Go can be demonstrated using structs and slices.
package main
import "fmt"
type Student struct {
StudentID int
Name string
Age int
}
type Result struct {
Name string
Age int
}
func selectData(students []Student) []Result {
var result []Result
for _, student := range students {
if student.StudentID == 101 {
result = append(result, Result{
Name: student.Name,
Age: student.Age,
})
}
}
return result
}
func main() {
students := []Student{
{101, "Ulysses", 13},
{53, "William", 10},
{128, "Henry", 6},
{3, "Henry", 11},
}
fmt.Println(selectData(students))
}
The Go version manually iterates through the slice because Go does not provide built in DataFrame operations like pandas.
Instead of boolean indexing, we use a standard loop and conditional filtering. The logic remains identical: check whether StudentID == 101, then collect the corresponding Name and Age.
Unlike pandas, Go requires explicit struct definitions for both the input and output data.
Worked Examples
Example 1
Input:
| student_id | name | age |
|---|---|---|
| 101 | Ulysses | 13 |
| 53 | William | 10 |
| 128 | Henry | 6 |
| 3 | Henry | 11 |
Step 1: Create Boolean Mask
We evaluate:
students["student_id"] == 101
Result:
| Row | student_id | Matches? |
|---|---|---|
| 0 | 101 | True |
| 1 | 53 | False |
| 2 | 128 | False |
| 3 | 3 | False |
Boolean mask:
[True, False, False, False]
Step 2: Filter Rows
Applying the mask keeps only rows where the value is True.
Intermediate result:
| student_id | name | age |
|---|---|---|
| 101 | Ulysses | 13 |
Step 3: Select Columns
We keep only name and age.
Final output:
| name | age |
|---|---|
| Ulysses | 13 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Every row is checked once against the condition |
| Space | O(k) | Output stores only matching rows |
The algorithm scans the student_id column once to build the boolean mask, so the runtime is linear in the number of rows.
The additional space depends on how many rows match the condition. If k rows match, the resulting DataFrame stores those k rows.
Test Cases
import pandas as pd
def selectData(students: pd.DataFrame) -> pd.DataFrame:
return students.loc[students["student_id"] == 101, ["name", "age"]]
# Example case
students = pd.DataFrame({
"student_id": [101, 53, 128, 3],
"name": ["Ulysses", "William", "Henry", "Henry"],
"age": [13, 10, 6, 11]
})
result = selectData(students)
expected = pd.DataFrame({
"name": ["Ulysses"],
"age": [13]
})
assert result.reset_index(drop=True).equals(expected) # Standard example
# Multiple matching students
students = pd.DataFrame({
"student_id": [101, 101],
"name": ["Alice", "Bob"],
"age": [15, 16]
})
result = selectData(students)
expected = pd.DataFrame({
"name": ["Alice", "Bob"],
"age": [15, 16]
})
assert result.reset_index(drop=True).equals(expected) # Multiple matches
# No matching student
students = pd.DataFrame({
"student_id": [1, 2, 3],
"name": ["A", "B", "C"],
"age": [10, 11, 12]
})
result = selectData(students)
expected = pd.DataFrame(columns=["name", "age"])
assert result.reset_index(drop=True).equals(expected) # Empty result
# Single row table with matching student
students = pd.DataFrame({
"student_id": [101],
"name": ["Solo"],
"age": [20]
})
result = selectData(students)
expected = pd.DataFrame({
"name": ["Solo"],
"age": [20]
})
assert result.reset_index(drop=True).equals(expected) # Single matching row
# Single row table without matching student
students = pd.DataFrame({
"student_id": [999],
"name": ["Nobody"],
"age": [30]
})
result = selectData(students)
expected = pd.DataFrame(columns=["name", "age"])
assert result.reset_index(drop=True).equals(expected) # Single non-matching row
| Test | Why |
|---|---|
| Standard example | Verifies basic filtering functionality |
| Multiple matching students | Ensures all matches are returned |
| No matching student | Confirms empty DataFrame handling |
| Single matching row | Tests smallest valid matching input |
| Single non matching row | Tests smallest non matching input |
Edge Cases
No Student With ID 101
A common bug is assuming that a matching row always exists. If no row has student_id = 101, some implementations may raise errors or return incorrect structures.
This implementation handles the case naturally through pandas filtering. The boolean mask simply contains all False values, and the result becomes an empty DataFrame with the correct columns.
Multiple Students With ID 101
Although the example suggests IDs may be unique, the problem statement does not explicitly guarantee uniqueness.
A naive implementation might stop after finding the first match. However, the pandas filtering approach automatically returns every matching row, making the solution more robust.
Preserving DataFrame Structure
Another subtle issue is accidentally returning a Series instead of a DataFrame.
For example:
students[students["student_id"] == 101]["name"]
would return only a Series.
The implementation avoids this by explicitly selecting both columns using:
[["name", "age"]]
which guarantees the return type remains a DataFrame.