Difficulty: 🟢 Easy
Topics: —

Solution

LeetCode 2877 - Create a DataFrame from List

Problem Understanding

This problem asks us to create a Pandas DataFrame from a given two dimensional list named student_data.

Each element of student_data is itself a list containing exactly two values:

A student ID
The student's age

For example:

[
  [1, 15],
  [2, 11],
  [3, 11],
  [4, 20]
]

represents four students. The first student has ID 1 and age 15, the second student has ID 2 and age 11, and so on.

The goal is to convert this raw list structure into a Pandas DataFrame with exactly two columns:

student_id
age

The rows must appear in the same order as they do in the original input list.

The output is therefore a tabular representation of the same data, where each inner list becomes a row and the column names are explicitly assigned.

Since this is a Pandas problem rather than a traditional algorithmic problem, there are no significant computational constraints. The task is primarily about understanding how to construct a DataFrame and assign column names correctly.

An important guarantee is that every row contains exactly two values corresponding to the required columns. Because of this guarantee, we do not need to validate row lengths or perform any error handling.

Potential edge cases include an empty input list, a single student record, or multiple students sharing the same age. All of these cases are naturally handled by DataFrame construction.

Approaches

Brute Force Approach

A manual approach would be to iterate through every row of student_data, extract the student ID and age, store them in separate collections, and then build a DataFrame from those collections.

For example, we could create two lists:

student_ids = [1, 2, 3, 4]
ages = [15, 11, 11, 20]

and then construct the DataFrame from a dictionary mapping column names to these lists.

This approach works because every row is processed exactly once and every value is copied into the appropriate column.

However, it performs unnecessary work. The input data is already organized in a structure that Pandas can directly convert into a DataFrame. Extracting values into separate lists only adds extra code and memory usage.

Optimal Approach

The key observation is that Pandas provides a built in constructor that can directly convert a two dimensional list into a DataFrame.

By passing the original data along with the desired column names:

pd.DataFrame(student_data, columns=["student_id", "age"])

Pandas automatically creates the correct table structure.

This is both simpler and more efficient because no manual processing of the rows is required.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n)	O(n)	Copies values into separate lists before creating the DataFrame
Optimal	O(n)	O(n)	Directly constructs the DataFrame from the original 2D list

Here, n represents the number of student records.

Algorithm Walkthrough

Optimal Algorithm

Receive the input 2D list student_data.
Call the Pandas DataFrame constructor using student_data as the row data.
Specify the column names as ["student_id", "age"].
Return the resulting DataFrame.

Why it works

Every inner list in student_data contains exactly two values. When Pandas receives the two dimensional list and the corresponding two column names, it maps the first value of each row to student_id and the second value to age. Since rows are processed in order, the resulting DataFrame preserves the original ordering of the input data.

Python Solution

import pandas as pd
from typing import List

def createDataframe(student_data: List[List[int]]) -> pd.DataFrame:
    return pd.DataFrame(student_data, columns=["student_id", "age"])

The implementation directly follows the algorithm described above.

The input list is passed to the pd.DataFrame constructor. The columns parameter assigns the required column names in the correct order. Pandas then creates one DataFrame row for each inner list in student_data.

Because the data is already structured appropriately, no iteration or preprocessing is necessary.

Go Solution

LeetCode 2877 is a Pandas specific problem. The platform provides a Python environment and expects a Pandas DataFrame as the return value. As a result, there is no official Go version of this problem because Go does not have a Pandas DataFrame equivalent in the LeetCode environment.

For completeness, the analogous logic in Go would simply store the data in a structure with named fields:

package main

type Student struct {
	StudentID int
	Age       int
}

func createDataframe(studentData [][]int) []Student {
	result := make([]Student, 0, len(studentData))

	for _, row := range studentData {
		result = append(result, Student{
			StudentID: row[0],
			Age:       row[1],
		})
	}

	return result
}

The Go version demonstrates the same transformation concept, converting each input row into a structured record. However, this is not a LeetCode-submittable solution for Problem 2877 because the actual problem specifically requires a Pandas DataFrame.

Worked Examples

Example 1

Input:

[
  [1, 15],
  [2, 11],
  [3, 11],
  [4, 20]
]

Pandas receives:

pd.DataFrame(
    student_data,
    columns=["student_id", "age"]
)

The rows are mapped as follows:

Input Row	student_id	age
[1, 15]	1	15
[2, 11]	2	11
[3, 11]	3	11
[4, 20]	4	20

Resulting DataFrame:

student_id	age
1	15
2	11
3	11
4	20

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Pandas processes each row once when creating the DataFrame
Space	O(n)	The resulting DataFrame stores all input rows

The constructor must read every row from the input list in order to create the DataFrame structure. Therefore, the runtime grows linearly with the number of student records. The DataFrame itself stores all records, requiring linear space.

Test Cases

import pandas as pd

# Example from the problem statement
result = createDataframe([[1, 15], [2, 11], [3, 11], [4, 20]])
expected = pd.DataFrame(
    [[1, 15], [2, 11], [3, 11], [4, 20]],
    columns=["student_id", "age"]
)
assert result.equals(expected)  # Standard example

# Empty input
result = createDataframe([])
expected = pd.DataFrame([], columns=["student_id", "age"])
assert result.equals(expected)  # No students

# Single row
result = createDataframe([[5, 18]])
expected = pd.DataFrame([[5, 18]], columns=["student_id", "age"])
assert result.equals(expected)  # One student

# Duplicate ages
result = createDataframe([[1, 10], [2, 10], [3, 10]])
expected = pd.DataFrame(
    [[1, 10], [2, 10], [3, 10]],
    columns=["student_id", "age"]
)
assert result.equals(expected)  # Repeated age values

# Large IDs
result = createDataframe([[1000000, 21], [2000000, 22]])
expected = pd.DataFrame(
    [[1000000, 21], [2000000, 22]],
    columns=["student_id", "age"]
)
assert result.equals(expected)  # Large numeric values

Test Case Summary

Test	Why
Standard example	Verifies normal behavior
Empty input	Ensures an empty DataFrame is created correctly
Single row	Validates the smallest non empty input
Duplicate ages	Confirms repeated values are preserved
Large IDs	Ensures larger integers are handled correctly

Edge Cases

Empty Input List

The input may contain no student records at all:

[]

A common mistake is assuming at least one row exists. Pandas handles this naturally. The implementation still creates a DataFrame with the required column names and zero rows.

Single Student Record

The input may contain exactly one student:

[[5, 18]]

Some implementations incorrectly treat a single row as a one dimensional structure. By directly passing the two dimensional list to the DataFrame constructor, the row is correctly preserved as a single record.

Duplicate Values

Multiple students may have the same age:

[
  [1, 11],
  [2, 11],
  [3, 11]
]

A flawed implementation might accidentally aggregate or deduplicate data. The DataFrame constructor performs no such transformations and preserves every row exactly as provided.

Large Numeric Values

Student IDs may be much larger than those shown in the examples:

[
  [1000000, 21],
  [2000000, 22]
]

Since the solution simply stores the values in a DataFrame without additional computation, large integer values are preserved correctly and require no special handling.