LeetCode 2883 - Drop Missing Data
This problem provides a pandas DataFrame named students with three columns: | Column | Type | | --- | --- | | studentid | int | | name | object | | age | int | Some rows contain missing values in the name column.
Difficulty: 🟢 Easy
Topics: —
Solution
LeetCode 2883 - Drop Missing Data
Problem Understanding
This problem provides a pandas DataFrame named students with three columns:
| Column | Type |
|---|---|
| student_id | int |
| name | object |
| age | int |
Some rows contain missing values in the name column. In pandas, these missing values are represented as None or other null-like values.
The task is to return a DataFrame that contains only the rows where the name column has a valid value. Any row with a missing value in name must be removed.
In other words, we need to filter the DataFrame and keep only the records whose name field is not null.
The input represents a collection of student records. Each row corresponds to one student. The expected output is the same table structure, except that rows with missing names have been removed.
Since this is a pandas DataFrame problem, the focus is not on designing a complex algorithm. Instead, the goal is to use the appropriate pandas operation to efficiently remove rows containing missing data in a specific column.
An important detail is that only the name column matters. Missing values in other columns are irrelevant to this problem because the statement specifically asks us to remove rows whose name value is missing.
Edge cases include a DataFrame where every row has a valid name, a DataFrame where every row has a missing name, and a DataFrame containing only a single row. A correct solution should handle all of these cases naturally.
Approaches
Brute Force Approach
A straightforward approach is to iterate through every row of the DataFrame manually. For each row, check whether the value in the name column is missing. If it is not missing, store that row in a new collection. After processing all rows, construct a new DataFrame from the collected rows.
This approach is correct because every row is examined exactly once, and only rows with valid names are copied into the result.
However, it is unnecessarily verbose and does not take advantage of pandas' built-in functionality. Manual iteration over DataFrame rows is generally slower and less idiomatic than vectorized pandas operations.
Optimal Approach
The key observation is that pandas already provides built-in methods for handling missing values.
The dropna() method can remove rows containing null values. By specifying the subset=["name"] parameter, we instruct pandas to examine only the name column when deciding whether a row should be removed.
This solution is concise, efficient, and leverages pandas' optimized internal implementation.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(n) | Iterate through rows manually and build a new DataFrame |
| Optimal | O(n) | O(n) | Use pandas dropna(subset=["name"]) to filter rows efficiently |
Algorithm Walkthrough
- Receive the input DataFrame
students. - Examine the
namecolumn for missing values. - Use pandas
dropna()withsubset=["name"]so that only thenamecolumn is considered when identifying rows to remove. - Remove every row whose
namevalue is null. - Return the resulting filtered DataFrame.
Why it works
The pandas dropna(subset=["name"]) operation removes exactly those rows where the name column contains a missing value. Since the problem requires keeping all rows with valid names and removing all rows with missing names, the resulting DataFrame is precisely the desired output.
Python Solution
import pandas as pd
def dropMissingData(students: pd.DataFrame) -> pd.DataFrame:
return students.dropna(subset=["name"])
The implementation consists of a single pandas operation. The dropna() method removes rows containing null values. By providing subset=["name"], only the name column is checked.
Rows with valid names remain unchanged, while rows whose name value is missing are excluded from the returned DataFrame.
Go Solution
LeetCode DataFrame problems are designed specifically for pandas and Python. There is no official Go version of this problem because Go does not provide the pandas DataFrame API used by the platform.
For reference, the equivalent logic in Go would be filtering records whose name field is not missing.
type Student struct {
StudentID int
Name *string
Age int
}
func dropMissingData(students []Student) []Student {
result := make([]Student, 0)
for _, student := range students {
if student.Name != nil {
result = append(result, student)
}
}
return result
}
The Go version uses a pointer for the Name field so that nil can represent a missing value. During iteration, only records with non-nil names are copied into the result slice.
Worked Examples
Example 1
Input:
| student_id | name | age |
|---|---|---|
| 32 | Piper | 5 |
| 217 | None | 19 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
The algorithm applies:
students.dropna(subset=["name"])
Row-by-row evaluation:
| student_id | name | Missing? | Keep? |
|---|---|---|---|
| 32 | Piper | No | Yes |
| 217 | None | Yes | No |
| 779 | Georgia | No | Yes |
| 849 | Willow | No | Yes |
Result:
| student_id | name | age |
|---|---|---|
| 32 | Piper | 5 |
| 779 | Georgia | 20 |
| 849 | Willow | 14 |
The row with student_id = 217 is removed because its name value is missing.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Every row must be checked once for a missing name value |
| Space | O(n) | The returned DataFrame may contain up to all rows of the original DataFrame |
The operation scans the specified column once to determine which rows should remain. Internally, pandas creates a filtered DataFrame containing the retained rows, which requires space proportional to the output size.
Test Cases
import pandas as pd
# Example from the problem statement
df = pd.DataFrame({
"student_id": [32, 217, 779, 849],
"name": ["Piper", None, "Georgia", "Willow"],
"age": [5, 19, 20, 14]
})
result = dropMissingData(df)
assert len(result) == 3 # removes one row with missing name
# No missing names
df = pd.DataFrame({
"student_id": [1, 2],
"name": ["Alice", "Bob"],
"age": [10, 11]
})
result = dropMissingData(df)
assert len(result) == 2 # all rows remain
# All names missing
df = pd.DataFrame({
"student_id": [1, 2],
"name": [None, None],
"age": [10, 11]
})
result = dropMissingData(df)
assert len(result) == 0 # all rows removed
# Single valid row
df = pd.DataFrame({
"student_id": [1],
"name": ["Alice"],
"age": [10]
})
result = dropMissingData(df)
assert len(result) == 1 # single row retained
# Single missing row
df = pd.DataFrame({
"student_id": [1],
"name": [None],
"age": [10]
})
result = dropMissingData(df)
assert len(result) == 0 # single row removed
# Mixed dataset
df = pd.DataFrame({
"student_id": [1, 2, 3, 4],
"name": ["Alice", None, "Bob", None],
"age": [10, 11, 12, 13]
})
result = dropMissingData(df)
assert result["student_id"].tolist() == [1, 3] # keeps only valid names
Test Summary
| Test | Why |
|---|---|
| Problem example | Verifies the expected behavior from the statement |
| No missing names | Ensures valid rows are preserved |
| All names missing | Ensures complete removal is handled correctly |
| Single valid row | Tests minimum non-empty input |
| Single missing row | Tests minimum removable input |
| Mixed dataset | Verifies selective filtering works correctly |
Edge Cases
All Rows Have Valid Names
A common edge case is a DataFrame where every row already contains a valid name. A buggy implementation might accidentally modify or remove rows unnecessarily. The dropna(subset=["name"]) operation leaves all rows unchanged because no missing values are found.
All Rows Have Missing Names
Another important case occurs when every row contains a missing name. The correct result is an empty DataFrame with the same column structure. Pandas handles this naturally by removing every row that matches the filtering condition.
Single Row Input
With only one row, the result depends entirely on whether that row's name value is missing. If the name exists, the row remains. If the name is missing, the output becomes an empty DataFrame. The implementation correctly handles both scenarios without requiring any special logic.
Multiple Consecutive Missing Rows
Some implementations that manually delete rows while iterating can accidentally skip entries when multiple missing rows appear consecutively. The pandas dropna() method evaluates all rows simultaneously, avoiding this class of bugs entirely.