LeetCode 2887 - Fill Missing Data
The problem provides a Pandas DataFrame named products with three columns: | Column | Type | | --- | --- | | name | object | | quantity | int | | price | int | The task is to replace all missing values in the quantity column with 0.
Difficulty: 🟢 Easy
Topics: —
Solution
Problem Understanding
The problem provides a Pandas DataFrame named products with three columns:
| Column | Type |
|---|---|
name |
object |
quantity |
int |
price |
int |
The task is to replace all missing values in the quantity column with 0.
In other words, some products may not have a recorded quantity, represented as None (or equivalently NaN in a Pandas DataFrame). Whenever this happens, we must substitute that missing value with 0, while leaving all existing valid quantities unchanged.
The input is a Pandas DataFrame where each row represents a product and contains its name, quantity, and price. The output should be the same DataFrame structure, except that every missing value in the quantity column has been filled with 0.
The problem is classified as Easy because it mainly tests familiarity with Pandas operations for handling missing data rather than requiring an advanced algorithmic technique.
An important observation is that only the quantity column should be modified. Other columns, such as name and price, must remain untouched. Additionally, rows with valid quantities should not be changed.
Several edge cases are worth considering. The DataFrame might contain no missing values at all, in which case the output should remain unchanged. The quantity column might consist entirely of missing values, requiring every row to become 0. The DataFrame could also contain only a single row, which should still be processed correctly.
Approaches
Brute Force Approach
A straightforward approach is to iterate through every row in the DataFrame and manually inspect the quantity column. For each row, we check whether the quantity is missing. If it is, we explicitly assign 0 to that row.
This approach works because every row is examined exactly once, ensuring that all missing values are detected and replaced. However, it is unnecessarily verbose and less efficient in practice because Pandas already provides optimized vectorized operations for this exact task.
For example, we could loop over indices and write conditional logic:
for index in products.index:
if pd.isna(products.loc[index, "quantity"]):
products.loc[index, "quantity"] = 0
Although correct, this solution is not idiomatic Pandas code.
Optimal Approach
The key observation is that Pandas includes a built in method called fillna() that efficiently replaces missing values in a column.
Instead of iterating row by row, we can directly target the quantity column and replace every missing value with 0 in one vectorized operation:
products["quantity"] = products["quantity"].fillna(0)
This approach is cleaner, easier to read, and internally optimized by Pandas.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(1) | Iterates through every row manually and updates missing values |
| Optimal | O(n) | O(1) | Uses Pandas fillna() for efficient vectorized replacement |
Algorithm Walkthrough
- Access the
quantitycolumn from theproductsDataFrame because this is the only column that requires modification. - Use the
fillna(0)method on the column. This method scans through the column and replaces every missing value (NoneorNaN) with0. - Assign the modified column back to
products["quantity"]so the DataFrame reflects the updates. - Return the updated
productsDataFrame.
Why it works
The algorithm works because fillna(0) guarantees that every missing value in the selected column is replaced with 0, while preserving all non missing values exactly as they are. Since we only apply the operation to the quantity column, the rest of the DataFrame remains unchanged.
Python Solution
import pandas as pd
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
products["quantity"] = products["quantity"].fillna(0)
return products
The implementation directly follows the optimal algorithm. First, we select the quantity column using products["quantity"]. Then, we call fillna(0) to replace missing values with 0. The updated column is reassigned to the DataFrame, ensuring the modification persists. Finally, the modified DataFrame is returned.
This solution is concise and leverages Pandas vectorized operations, which are preferred over manual iteration for performance and readability.
Go Solution
LeetCode Pandas problems are Python specific and do not have a Go submission environment. However, if we conceptually translate the same logic into Go, the implementation would involve iterating through records and replacing missing quantities with 0.
package main
type Product struct {
Name string
Quantity *int
Price int
}
func fillMissingValues(products []Product) []Product {
for i := range products {
if products[i].Quantity == nil {
zero := 0
products[i].Quantity = &zero
}
}
return products
}
The main Go specific difference is that Go does not have built in DataFrame support like Pandas. Missing values are typically represented using pointers, where nil indicates absence. Instead of fillna(), we manually iterate through the slice and replace nil quantities with a pointer to 0.
Worked Examples
Example 1
Input DataFrame:
| name | quantity | price |
|---|---|---|
| Wristwatch | None | 135 |
| WirelessEarbuds | None | 821 |
| GolfClubs | 779 | 9319 |
| Printer | 849 | 3051 |
Step by Step Execution
We process the quantity column using:
products["quantity"] = products["quantity"].fillna(0)
State of the quantity column:
| Row | Original Value | After fillna(0) |
|---|---|---|
| Wristwatch | None | 0 |
| WirelessEarbuds | None | 0 |
| GolfClubs | 779 | 779 |
| Printer | 849 | 849 |
Final output:
| name | quantity | price |
|---|---|---|
| Wristwatch | 0 | 135 |
| WirelessEarbuds | 0 | 821 |
| GolfClubs | 779 | 9319 |
| Printer | 849 | 3051 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Pandas scans the quantity column once |
| Space | O(1) | Only a constant amount of extra space is used |
The time complexity is O(n) because each row in the quantity column must be inspected once to determine whether it contains a missing value. The space complexity is O(1) because the operation modifies the column directly without requiring an additional data structure proportional to input size.
Test Cases
import pandas as pd
def fillMissingValues(products: pd.DataFrame) -> pd.DataFrame:
products["quantity"] = products["quantity"].fillna(0)
return products
# Test 1: Provided example
df = pd.DataFrame({
"name": ["Wristwatch", "WirelessEarbuds", "GolfClubs", "Printer"],
"quantity": [None, None, 779, 849],
"price": [135, 821, 9319, 3051]
})
result = fillMissingValues(df)
assert result["quantity"].tolist() == [0, 0, 779, 849] # Example case
# Test 2: No missing values
df = pd.DataFrame({
"name": ["A", "B"],
"quantity": [10, 20],
"price": [100, 200]
})
result = fillMissingValues(df)
assert result["quantity"].tolist() == [10, 20] # No modification needed
# Test 3: All values missing
df = pd.DataFrame({
"name": ["A", "B", "C"],
"quantity": [None, None, None],
"price": [10, 20, 30]
})
result = fillMissingValues(df)
assert result["quantity"].tolist() == [0, 0, 0] # All replaced
# Test 4: Single row with missing value
df = pd.DataFrame({
"name": ["Laptop"],
"quantity": [None],
"price": [999]
})
result = fillMissingValues(df)
assert result["quantity"].tolist() == [0] # Single element case
# Test 5: Single row with valid quantity
df = pd.DataFrame({
"name": ["Laptop"],
"quantity": [15],
"price": [999]
})
result = fillMissingValues(df)
assert result["quantity"].tolist() == [15] # Existing value preserved
| Test | Why |
|---|---|
| Provided example | Verifies correctness against the official example |
| No missing values | Ensures valid quantities remain unchanged |
| All values missing | Confirms every missing value becomes 0 |
| Single row with missing value | Tests minimum sized input with replacement |
| Single row with valid quantity | Ensures non missing values are preserved |
Edge Cases
One important edge case is when the quantity column contains no missing values. A careless implementation might accidentally alter valid entries or perform unnecessary transformations. Using fillna(0) safely preserves all non missing values, so the DataFrame remains unchanged.
Another important case occurs when every quantity is missing. A manual implementation could fail if it assumes at least one valid number exists. The current implementation handles this naturally because fillna(0) independently replaces every missing value.
A final edge case is a DataFrame with only one row. Small inputs often expose indexing mistakes or assumptions about iteration. Since the solution operates directly on the column rather than relying on index ranges, single row inputs work correctly without any special handling.