LeetCode 2888 - Reshape Data: Concatenate
The problem is asking us to vertically concatenate two DataFrames into a single unified DataFrame. In simpler terms, given two tables df1 and df2 with identical columns and types, we need to stack the rows of df2 below the rows of df1.
Difficulty: 🟢 Easy
Topics: —
Solution
Problem Understanding
The problem is asking us to vertically concatenate two DataFrames into a single unified DataFrame. In simpler terms, given two tables df1 and df2 with identical columns and types, we need to stack the rows of df2 below the rows of df1.
The input consists of two pandas-like DataFrames, each with columns student_id, name, and age. Each row represents a student's information. The expected output is a new DataFrame that preserves the original column order and types, containing all rows from both DataFrames in order.
Key points and constraints to note: df1 and df2 are guaranteed to have the same schema, so we do not need to worry about column mismatches. However, the number of rows may vary, and either DataFrame could be empty. Edge cases include one or both DataFrames being empty, or rows containing duplicate student_ids, which the problem explicitly allows because it only asks for concatenation.
Approaches
The brute-force approach would involve manually iterating through each DataFrame, creating a new list of rows, and appending rows from both DataFrames. While this approach works, it is verbose and unnecessary when using high-level libraries like pandas or SQL-like DataFrame operations.
The optimal solution leverages built-in DataFrame concatenation methods. In Python, pandas.concat([df1, df2]) handles this efficiently. In Go, if using an SQL-like or in-memory table structure, one would append the slices representing rows. The key insight is that DataFrames are just tabular structures in memory, and vertical concatenation preserves the column structure while stacking rows.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n + m) | O(n + m) | Manually iterate through rows and construct a new DataFrame |
| Optimal | O(n + m) | O(n + m) | Use built-in concatenation functions to stack rows efficiently |
n and m denote the number of rows in df1 and df2 respectively.
Algorithm Walkthrough
- Input Validation: Check that both DataFrames have identical columns. If using a typed system, this is often guaranteed, so this step can be skipped.
- Vertical Concatenation: Use a built-in concatenation method to combine the two DataFrames. In Python,
pandas.concat([df1, df2], ignore_index=True)stacksdf2belowdf1and optionally resets the index. - Return the Result: Output the new DataFrame containing all rows.
Why it works: The built-in concatenation function preserves the column order and data types, simply stacking rows from multiple sources. Because the columns match exactly, no data transformation or alignment is needed.
Python Solution
import pandas as pd
class Solution:
def concatenateDataFrames(self, df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
"""
Concatenate two DataFrames vertically.
"""
# Use pandas built-in concat function to stack the DataFrames vertically
result_df = pd.concat([df1, df2], ignore_index=True)
return result_df
This implementation directly uses pandas.concat to stack the two DataFrames. The ignore_index=True parameter ensures that the resulting DataFrame has a clean sequential index rather than preserving the original DataFrames' indices.
Go Solution
package main
type Student struct {
StudentID int
Name string
Age int
}
func ConcatenateDataFrames(df1 []Student, df2 []Student) []Student {
// Preallocate the resulting slice with total capacity for efficiency
result := make([]Student, 0, len(df1)+len(df2))
result = append(result, df1...)
result = append(result, df2...)
return result
}
In Go, we represent each row as a Student struct and each DataFrame as a slice of Student. Concatenation is done by appending slices. Preallocating the slice improves efficiency by avoiding multiple memory reallocations.
Worked Examples
For the input:
df1:
+------------+---------+-----+
| student_id | name | age |
+------------+---------+-----+
| 1 | Mason | 8 |
| 2 | Ava | 6 |
| 3 | Taylor | 15 |
| 4 | Georgia | 17 |
df2:
+------------+------+-----+
| student_id | name | age |
+------------+------+-----+
| 5 | Leo | 7 |
| 6 | Alex | 7 |
Step by step:
- The concatenation function receives
df1anddf2. - The rows of
df2are stacked belowdf1. - The resulting DataFrame has 6 rows, maintaining the original column order.
| Index | student_id | name | age |
|---|---|---|---|
| 0 | 1 | Mason | 8 |
| 1 | 2 | Ava | 6 |
| 2 | 3 | Taylor | 15 |
| 3 | 4 | Georgia | 17 |
| 4 | 5 | Leo | 7 |
| 5 | 6 | Alex | 7 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n + m) | Concatenation requires visiting each row in both DataFrames once |
| Space | O(n + m) | A new DataFrame/slice is created to hold all rows |
The time complexity is linear with respect to the total number of rows because each row must be copied. The space complexity is also linear because a new structure must hold the combined data.
Test Cases
import pandas as pd
# Provided example
df1 = pd.DataFrame({
"student_id": [1, 2, 3, 4],
"name": ["Mason", "Ava", "Taylor", "Georgia"],
"age": [8, 6, 15, 17]
})
df2 = pd.DataFrame({
"student_id": [5, 6],
"name": ["Leo", "Alex"],
"age": [7, 7]
})
sol = Solution()
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 6 # All rows combined
assert result.iloc[4]["name"] == "Leo" # Check specific row
# Empty df1
df1 = pd.DataFrame(columns=["student_id", "name", "age"])
df2 = pd.DataFrame({
"student_id": [1],
"name": ["John"],
"age": [10]
})
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 1
# Empty df2
df1 = pd.DataFrame({
"student_id": [1],
"name": ["John"],
"age": [10]
})
df2 = pd.DataFrame(columns=["student_id", "name", "age"])
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 1
# Both empty
df1 = pd.DataFrame(columns=["student_id", "name", "age"])
df2 = pd.DataFrame(columns=["student_id", "name", "age"])
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 0
| Test | Why |
|---|---|
| Provided example | Validates normal concatenation |
| Empty df1 | Ensures function works when first DataFrame is empty |
| Empty df2 | Ensures function works when second DataFrame is empty |
| Both empty | Ensures function handles completely empty input |
Edge Cases
The first edge case occurs when one DataFrame is empty. If df1 is empty, the function should return df2 as is, and vice versa. Using built-in concatenation handles this seamlessly.
The second edge case is both DataFrames empty. This could trigger errors in naive implementations that assume at least one row exists. The provided solution correctly returns an empty DataFrame with the proper columns.
The third edge case is duplicate student_ids. The problem statement does not require uniqueness, so the algorithm simply stacks the rows. Any uniqueness constraints must be handled separately if required, but the solution handles duplicates correctly by preserving all rows.