LeetCode 2888 - Reshape Data: Concatenate

The problem is asking us to vertically concatenate two DataFrames into a single unified DataFrame. In simpler terms, given two tables df1 and df2 with identical columns and types, we need to stack the rows of df2 below the rows of df1.

LeetCode Problem 2888

Difficulty: 🟢 Easy
Topics:

Solution

Problem Understanding

The problem is asking us to vertically concatenate two DataFrames into a single unified DataFrame. In simpler terms, given two tables df1 and df2 with identical columns and types, we need to stack the rows of df2 below the rows of df1.

The input consists of two pandas-like DataFrames, each with columns student_id, name, and age. Each row represents a student's information. The expected output is a new DataFrame that preserves the original column order and types, containing all rows from both DataFrames in order.

Key points and constraints to note: df1 and df2 are guaranteed to have the same schema, so we do not need to worry about column mismatches. However, the number of rows may vary, and either DataFrame could be empty. Edge cases include one or both DataFrames being empty, or rows containing duplicate student_ids, which the problem explicitly allows because it only asks for concatenation.

Approaches

The brute-force approach would involve manually iterating through each DataFrame, creating a new list of rows, and appending rows from both DataFrames. While this approach works, it is verbose and unnecessary when using high-level libraries like pandas or SQL-like DataFrame operations.

The optimal solution leverages built-in DataFrame concatenation methods. In Python, pandas.concat([df1, df2]) handles this efficiently. In Go, if using an SQL-like or in-memory table structure, one would append the slices representing rows. The key insight is that DataFrames are just tabular structures in memory, and vertical concatenation preserves the column structure while stacking rows.

Approach Time Complexity Space Complexity Notes
Brute Force O(n + m) O(n + m) Manually iterate through rows and construct a new DataFrame
Optimal O(n + m) O(n + m) Use built-in concatenation functions to stack rows efficiently

n and m denote the number of rows in df1 and df2 respectively.

Algorithm Walkthrough

  1. Input Validation: Check that both DataFrames have identical columns. If using a typed system, this is often guaranteed, so this step can be skipped.
  2. Vertical Concatenation: Use a built-in concatenation method to combine the two DataFrames. In Python, pandas.concat([df1, df2], ignore_index=True) stacks df2 below df1 and optionally resets the index.
  3. Return the Result: Output the new DataFrame containing all rows.

Why it works: The built-in concatenation function preserves the column order and data types, simply stacking rows from multiple sources. Because the columns match exactly, no data transformation or alignment is needed.

Python Solution

import pandas as pd

class Solution:
    def concatenateDataFrames(self, df1: pd.DataFrame, df2: pd.DataFrame) -> pd.DataFrame:
        """
        Concatenate two DataFrames vertically.
        """
        # Use pandas built-in concat function to stack the DataFrames vertically
        result_df = pd.concat([df1, df2], ignore_index=True)
        return result_df

This implementation directly uses pandas.concat to stack the two DataFrames. The ignore_index=True parameter ensures that the resulting DataFrame has a clean sequential index rather than preserving the original DataFrames' indices.

Go Solution

package main

type Student struct {
    StudentID int
    Name      string
    Age       int
}

func ConcatenateDataFrames(df1 []Student, df2 []Student) []Student {
    // Preallocate the resulting slice with total capacity for efficiency
    result := make([]Student, 0, len(df1)+len(df2))
    result = append(result, df1...)
    result = append(result, df2...)
    return result
}

In Go, we represent each row as a Student struct and each DataFrame as a slice of Student. Concatenation is done by appending slices. Preallocating the slice improves efficiency by avoiding multiple memory reallocations.

Worked Examples

For the input:

df1:
+------------+---------+-----+
| student_id | name    | age |
+------------+---------+-----+
| 1          | Mason   | 8   |
| 2          | Ava     | 6   |
| 3          | Taylor  | 15  |
| 4          | Georgia | 17  |
df2:
+------------+------+-----+
| student_id | name | age |
+------------+------+-----+
| 5          | Leo  | 7   |
| 6          | Alex | 7   |

Step by step:

  1. The concatenation function receives df1 and df2.
  2. The rows of df2 are stacked below df1.
  3. The resulting DataFrame has 6 rows, maintaining the original column order.
Index student_id name age
0 1 Mason 8
1 2 Ava 6
2 3 Taylor 15
3 4 Georgia 17
4 5 Leo 7
5 6 Alex 7

Complexity Analysis

Measure Complexity Explanation
Time O(n + m) Concatenation requires visiting each row in both DataFrames once
Space O(n + m) A new DataFrame/slice is created to hold all rows

The time complexity is linear with respect to the total number of rows because each row must be copied. The space complexity is also linear because a new structure must hold the combined data.

Test Cases

import pandas as pd

# Provided example
df1 = pd.DataFrame({
    "student_id": [1, 2, 3, 4],
    "name": ["Mason", "Ava", "Taylor", "Georgia"],
    "age": [8, 6, 15, 17]
})
df2 = pd.DataFrame({
    "student_id": [5, 6],
    "name": ["Leo", "Alex"],
    "age": [7, 7]
})
sol = Solution()
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 6  # All rows combined
assert result.iloc[4]["name"] == "Leo"  # Check specific row

# Empty df1
df1 = pd.DataFrame(columns=["student_id", "name", "age"])
df2 = pd.DataFrame({
    "student_id": [1],
    "name": ["John"],
    "age": [10]
})
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 1

# Empty df2
df1 = pd.DataFrame({
    "student_id": [1],
    "name": ["John"],
    "age": [10]
})
df2 = pd.DataFrame(columns=["student_id", "name", "age"])
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 1

# Both empty
df1 = pd.DataFrame(columns=["student_id", "name", "age"])
df2 = pd.DataFrame(columns=["student_id", "name", "age"])
result = sol.concatenateDataFrames(df1, df2)
assert result.shape[0] == 0
Test Why
Provided example Validates normal concatenation
Empty df1 Ensures function works when first DataFrame is empty
Empty df2 Ensures function works when second DataFrame is empty
Both empty Ensures function handles completely empty input

Edge Cases

The first edge case occurs when one DataFrame is empty. If df1 is empty, the function should return df2 as is, and vice versa. Using built-in concatenation handles this seamlessly.

The second edge case is both DataFrames empty. This could trigger errors in naive implementations that assume at least one row exists. The provided solution correctly returns an empty DataFrame with the proper columns.

The third edge case is duplicate student_ids. The problem statement does not require uniqueness, so the algorithm simply stacks the rows. Any uniqueness constraints must be handled separately if required, but the solution handles duplicates correctly by preserving all rows.