LeetCode 2880 - Select Data

This problem provides a pandas DataFrame named students with three columns: | Column | Description | | --- | --- | | studentid | Unique identifier for a student | | name | Student name | | age | Student age | The task is to return only the name and age columns for the student…

LeetCode Problem 2880

Difficulty: 🟢 Easy
Topics:

Solution

Problem Understanding

This problem provides a pandas DataFrame named students with three columns:

Column Description
student_id Unique identifier for a student
name Student name
age Student age

The task is to return only the name and age columns for the student whose student_id is equal to 101.

In other words, we need to filter the table so that only rows matching student_id == 101 remain, then select only the relevant columns from those rows.

The input is a pandas DataFrame, not a traditional array or list. That means the intended solution is based on pandas operations such as filtering and column selection.

The output must also be a DataFrame, not a scalar value or dictionary. Even if there is only one matching row, the result should preserve the tabular structure.

The problem guarantees that the table follows the specified schema. Since this is an Easy pandas problem, the expected solution is straightforward and efficient using built in DataFrame operations.

One important detail is that the result must contain only the name and age columns. Returning all columns would produce the wrong output format.

Another important edge case is the possibility of multiple rows with student_id = 101. Although the examples imply uniqueness, the filtering operation naturally handles multiple matches correctly by returning all matching rows.

If no student with ID 101 exists, pandas filtering would simply return an empty DataFrame with the selected columns.

Approaches

Brute Force Approach

A brute force solution would manually iterate through every row in the DataFrame, check whether the student_id equals 101, and then collect the corresponding name and age values into a new structure.

Conceptually, the algorithm would work like this:

  1. Traverse every row one by one.
  2. Check whether student_id == 101.
  3. If it matches, store the name and age.
  4. Construct a new DataFrame from the collected results.

This approach is correct because every row is examined, so no valid match can be missed.

However, this solution is unnecessarily verbose and inefficient for pandas problems. Iterating row by row defeats the purpose of vectorized DataFrame operations, which are optimized internally in C and are significantly faster and cleaner.

Optimal Approach

The optimal solution uses pandas boolean indexing.

The key insight is that pandas allows us to filter rows directly using a boolean condition:

students["student_id"] == 101

This produces a boolean mask where matching rows are marked as True.

We can then use this mask to filter the DataFrame and select only the desired columns:

students.loc[mask, ["name", "age"]]

This approach is concise, efficient, and idiomatic pandas code.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(k) Iterates manually through every row
Optimal O(n) O(k) Uses vectorized pandas filtering

Here, n is the number of rows in the DataFrame, and k is the number of matching rows.

Algorithm Walkthrough

  1. Start with the students DataFrame containing all student records.
  2. Create a boolean condition that checks whether each row has student_id == 101.
students["student_id"] == 101

This produces a boolean Series such as:

[True, False, False, False]
  1. Use .loc[] to filter the DataFrame using the boolean mask.
students.loc[mask]

This keeps only rows where the condition is True. 4. Select only the name and age columns.

students.loc[mask, ["name", "age"]]
  1. Return the resulting DataFrame.

Why it works

The algorithm works because boolean indexing in pandas preserves exactly those rows where the condition evaluates to True. Since the condition specifically checks for student_id == 101, every returned row must belong to the target student. Selecting the name and age columns afterward guarantees the output format matches the problem requirements.

Python Solution

import pandas as pd

def selectData(students: pd.DataFrame) -> pd.DataFrame:
    return students.loc[students["student_id"] == 101, ["name", "age"]]

The implementation is intentionally concise because pandas provides powerful built in filtering operations.

The expression:

students["student_id"] == 101

creates a boolean mask that identifies all matching rows.

The .loc[] accessor is then used for two operations simultaneously:

  1. Filtering rows using the boolean mask.
  2. Selecting only the name and age columns.

The returned value is still a DataFrame, which matches the expected LeetCode output format.

Go Solution

LeetCode problem 2880 is a pandas specific problem, so Go is not officially supported on the platform for this question. However, the equivalent logic in Go can be demonstrated using structs and slices.

package main

import "fmt"

type Student struct {
	StudentID int
	Name      string
	Age       int
}

type Result struct {
	Name string
	Age  int
}

func selectData(students []Student) []Result {
	var result []Result

	for _, student := range students {
		if student.StudentID == 101 {
			result = append(result, Result{
				Name: student.Name,
				Age:  student.Age,
			})
		}
	}

	return result
}

func main() {
	students := []Student{
		{101, "Ulysses", 13},
		{53, "William", 10},
		{128, "Henry", 6},
		{3, "Henry", 11},
	}

	fmt.Println(selectData(students))
}

The Go version manually iterates through the slice because Go does not provide built in DataFrame operations like pandas.

Instead of boolean indexing, we use a standard loop and conditional filtering. The logic remains identical: check whether StudentID == 101, then collect the corresponding Name and Age.

Unlike pandas, Go requires explicit struct definitions for both the input and output data.

Worked Examples

Example 1

Input:

student_id name age
101 Ulysses 13
53 William 10
128 Henry 6
3 Henry 11

Step 1: Create Boolean Mask

We evaluate:

students["student_id"] == 101

Result:

Row student_id Matches?
0 101 True
1 53 False
2 128 False
3 3 False

Boolean mask:

[True, False, False, False]

Step 2: Filter Rows

Applying the mask keeps only rows where the value is True.

Intermediate result:

student_id name age
101 Ulysses 13

Step 3: Select Columns

We keep only name and age.

Final output:

name age
Ulysses 13

Complexity Analysis

Measure Complexity Explanation
Time O(n) Every row is checked once against the condition
Space O(k) Output stores only matching rows

The algorithm scans the student_id column once to build the boolean mask, so the runtime is linear in the number of rows.

The additional space depends on how many rows match the condition. If k rows match, the resulting DataFrame stores those k rows.

Test Cases

import pandas as pd

def selectData(students: pd.DataFrame) -> pd.DataFrame:
    return students.loc[students["student_id"] == 101, ["name", "age"]]

# Example case
students = pd.DataFrame({
    "student_id": [101, 53, 128, 3],
    "name": ["Ulysses", "William", "Henry", "Henry"],
    "age": [13, 10, 6, 11]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Ulysses"],
    "age": [13]
})

assert result.reset_index(drop=True).equals(expected)  # Standard example

# Multiple matching students
students = pd.DataFrame({
    "student_id": [101, 101],
    "name": ["Alice", "Bob"],
    "age": [15, 16]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Alice", "Bob"],
    "age": [15, 16]
})

assert result.reset_index(drop=True).equals(expected)  # Multiple matches

# No matching student
students = pd.DataFrame({
    "student_id": [1, 2, 3],
    "name": ["A", "B", "C"],
    "age": [10, 11, 12]
})

result = selectData(students)

expected = pd.DataFrame(columns=["name", "age"])

assert result.reset_index(drop=True).equals(expected)  # Empty result

# Single row table with matching student
students = pd.DataFrame({
    "student_id": [101],
    "name": ["Solo"],
    "age": [20]
})

result = selectData(students)

expected = pd.DataFrame({
    "name": ["Solo"],
    "age": [20]
})

assert result.reset_index(drop=True).equals(expected)  # Single matching row

# Single row table without matching student
students = pd.DataFrame({
    "student_id": [999],
    "name": ["Nobody"],
    "age": [30]
})

result = selectData(students)

expected = pd.DataFrame(columns=["name", "age"])

assert result.reset_index(drop=True).equals(expected)  # Single non-matching row
Test Why
Standard example Verifies basic filtering functionality
Multiple matching students Ensures all matches are returned
No matching student Confirms empty DataFrame handling
Single matching row Tests smallest valid matching input
Single non matching row Tests smallest non matching input

Edge Cases

No Student With ID 101

A common bug is assuming that a matching row always exists. If no row has student_id = 101, some implementations may raise errors or return incorrect structures.

This implementation handles the case naturally through pandas filtering. The boolean mask simply contains all False values, and the result becomes an empty DataFrame with the correct columns.

Multiple Students With ID 101

Although the example suggests IDs may be unique, the problem statement does not explicitly guarantee uniqueness.

A naive implementation might stop after finding the first match. However, the pandas filtering approach automatically returns every matching row, making the solution more robust.

Preserving DataFrame Structure

Another subtle issue is accidentally returning a Series instead of a DataFrame.

For example:

students[students["student_id"] == 101]["name"]

would return only a Series.

The implementation avoids this by explicitly selecting both columns using:

[["name", "age"]]

which guarantees the return type remains a DataFrame.