LeetCode 2883 - Drop Missing Data

This problem provides a pandas DataFrame named students with three columns: | Column | Type | | --- | --- | | studentid | int | | name | object | | age | int | Some rows contain missing values in the name column.

LeetCode Problem 2883

Difficulty: 🟢 Easy
Topics:

Solution

LeetCode 2883 - Drop Missing Data

Problem Understanding

This problem provides a pandas DataFrame named students with three columns:

Column Type
student_id int
name object
age int

Some rows contain missing values in the name column. In pandas, these missing values are represented as None or other null-like values.

The task is to return a DataFrame that contains only the rows where the name column has a valid value. Any row with a missing value in name must be removed.

In other words, we need to filter the DataFrame and keep only the records whose name field is not null.

The input represents a collection of student records. Each row corresponds to one student. The expected output is the same table structure, except that rows with missing names have been removed.

Since this is a pandas DataFrame problem, the focus is not on designing a complex algorithm. Instead, the goal is to use the appropriate pandas operation to efficiently remove rows containing missing data in a specific column.

An important detail is that only the name column matters. Missing values in other columns are irrelevant to this problem because the statement specifically asks us to remove rows whose name value is missing.

Edge cases include a DataFrame where every row has a valid name, a DataFrame where every row has a missing name, and a DataFrame containing only a single row. A correct solution should handle all of these cases naturally.

Approaches

Brute Force Approach

A straightforward approach is to iterate through every row of the DataFrame manually. For each row, check whether the value in the name column is missing. If it is not missing, store that row in a new collection. After processing all rows, construct a new DataFrame from the collected rows.

This approach is correct because every row is examined exactly once, and only rows with valid names are copied into the result.

However, it is unnecessarily verbose and does not take advantage of pandas' built-in functionality. Manual iteration over DataFrame rows is generally slower and less idiomatic than vectorized pandas operations.

Optimal Approach

The key observation is that pandas already provides built-in methods for handling missing values.

The dropna() method can remove rows containing null values. By specifying the subset=["name"] parameter, we instruct pandas to examine only the name column when deciding whether a row should be removed.

This solution is concise, efficient, and leverages pandas' optimized internal implementation.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Iterate through rows manually and build a new DataFrame
Optimal O(n) O(n) Use pandas dropna(subset=["name"]) to filter rows efficiently

Algorithm Walkthrough

  1. Receive the input DataFrame students.
  2. Examine the name column for missing values.
  3. Use pandas dropna() with subset=["name"] so that only the name column is considered when identifying rows to remove.
  4. Remove every row whose name value is null.
  5. Return the resulting filtered DataFrame.

Why it works

The pandas dropna(subset=["name"]) operation removes exactly those rows where the name column contains a missing value. Since the problem requires keeping all rows with valid names and removing all rows with missing names, the resulting DataFrame is precisely the desired output.

Python Solution

import pandas as pd

def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
    return students.dropna(subset=["name"])

The implementation consists of a single pandas operation. The dropna() method removes rows containing null values. By providing subset=["name"], only the name column is checked.

Rows with valid names remain unchanged, while rows whose name value is missing are excluded from the returned DataFrame.

Go Solution

LeetCode DataFrame problems are designed specifically for pandas and Python. There is no official Go version of this problem because Go does not provide the pandas DataFrame API used by the platform.

For reference, the equivalent logic in Go would be filtering records whose name field is not missing.

type Student struct {
	StudentID int
	Name      *string
	Age       int
}

func dropMissingData(students []Student) []Student {
	result := make([]Student, 0)

	for _, student := range students {
		if student.Name != nil {
			result = append(result, student)
		}
	}

	return result
}

The Go version uses a pointer for the Name field so that nil can represent a missing value. During iteration, only records with non-nil names are copied into the result slice.

Worked Examples

Example 1

Input:

student_id name age
32 Piper 5
217 None 19
779 Georgia 20
849 Willow 14

The algorithm applies:

students.dropna(subset=["name"])

Row-by-row evaluation:

student_id name Missing? Keep?
32 Piper No Yes
217 None Yes No
779 Georgia No Yes
849 Willow No Yes

Result:

student_id name age
32 Piper 5
779 Georgia 20
849 Willow 14

The row with student_id = 217 is removed because its name value is missing.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Every row must be checked once for a missing name value
Space O(n) The returned DataFrame may contain up to all rows of the original DataFrame

The operation scans the specified column once to determine which rows should remain. Internally, pandas creates a filtered DataFrame containing the retained rows, which requires space proportional to the output size.

Test Cases

import pandas as pd

# Example from the problem statement
df = pd.DataFrame({
    "student_id": [32, 217, 779, 849],
    "name": ["Piper", None, "Georgia", "Willow"],
    "age": [5, 19, 20, 14]
})

result = dropMissingData(df)
assert len(result) == 3  # removes one row with missing name

# No missing names
df = pd.DataFrame({
    "student_id": [1, 2],
    "name": ["Alice", "Bob"],
    "age": [10, 11]
})

result = dropMissingData(df)
assert len(result) == 2  # all rows remain

# All names missing
df = pd.DataFrame({
    "student_id": [1, 2],
    "name": [None, None],
    "age": [10, 11]
})

result = dropMissingData(df)
assert len(result) == 0  # all rows removed

# Single valid row
df = pd.DataFrame({
    "student_id": [1],
    "name": ["Alice"],
    "age": [10]
})

result = dropMissingData(df)
assert len(result) == 1  # single row retained

# Single missing row
df = pd.DataFrame({
    "student_id": [1],
    "name": [None],
    "age": [10]
})

result = dropMissingData(df)
assert len(result) == 0  # single row removed

# Mixed dataset
df = pd.DataFrame({
    "student_id": [1, 2, 3, 4],
    "name": ["Alice", None, "Bob", None],
    "age": [10, 11, 12, 13]
})

result = dropMissingData(df)
assert result["student_id"].tolist() == [1, 3]  # keeps only valid names

Test Summary

Test Why
Problem example Verifies the expected behavior from the statement
No missing names Ensures valid rows are preserved
All names missing Ensures complete removal is handled correctly
Single valid row Tests minimum non-empty input
Single missing row Tests minimum removable input
Mixed dataset Verifies selective filtering works correctly

Edge Cases

All Rows Have Valid Names

A common edge case is a DataFrame where every row already contains a valid name. A buggy implementation might accidentally modify or remove rows unnecessarily. The dropna(subset=["name"]) operation leaves all rows unchanged because no missing values are found.

All Rows Have Missing Names

Another important case occurs when every row contains a missing name. The correct result is an empty DataFrame with the same column structure. Pandas handles this naturally by removing every row that matches the filtering condition.

Single Row Input

With only one row, the result depends entirely on whether that row's name value is missing. If the name exists, the row remains. If the name is missing, the output becomes an empty DataFrame. The implementation correctly handles both scenarios without requiring any special logic.

Multiple Consecutive Missing Rows

Some implementations that manually delete rows while iterating can accidentally skip entries when multiple missing rows appear consecutively. The pandas dropna() method evaluates all rows simultaneously, avoiding this class of bugs entirely.