LeetCode 2878 - Get the Size of a DataFrame

The problem asks us to determine the size of a DataFrame named players. Specifically, it requires computing two values: the number of rows and the number of columns.

LeetCode Problem 2878

Difficulty: 🟢 Easy
Topics:

Solution

Problem Understanding

The problem asks us to determine the size of a DataFrame named players. Specifically, it requires computing two values: the number of rows and the number of columns. In a DataFrame, rows represent individual records-in this case, players-and columns represent attributes of those records, such as player_id, name, age, position, and so on. The input is a well-formed DataFrame, so we do not need to handle malformed or missing structures. The expected output is an array [number of rows, number of columns].

Key points to note are that the DataFrame may contain any number of rows or columns, including zero, so edge cases could include an empty DataFrame (no rows) or a DataFrame with no columns. The problem guarantees that the input is a DataFrame object, so we do not need to validate types or structure.

Approaches

A naive approach would be to manually iterate over all rows and columns to count them. For instance, one could loop through the DataFrame's rows to count them and then inspect one row to count columns. While this method would yield the correct result, it is unnecessary given that DataFrame libraries provide built-in methods to retrieve dimensions efficiently. Iterating over all rows could become inefficient for large datasets, although for the constraints of an "Easy" problem, performance is not critical.

The optimal solution leverages the built-in attributes of the DataFrame, such as .shape in Python pandas, which directly provides a tuple (number_of_rows, number_of_columns). This approach is simple, highly efficient, and avoids unnecessary iteration.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(1) Iterate all rows to count them, inspect one row for columns
Optimal O(1) O(1) Use DataFrame's built-in .shape attribute

Algorithm Walkthrough

  1. Access the shape attribute of the players DataFrame. This attribute returns a tuple (rows, columns) where rows is the number of rows and columns is the number of columns.
  2. Convert the tuple into a list to match the expected return format [number of rows, number of columns].
  3. Return the resulting list as the output.

Why it works: The .shape attribute is guaranteed by the DataFrame implementation to accurately reflect the dimensions of the DataFrame, including edge cases like empty DataFrames or DataFrames with zero columns. There is no need for iteration or additional computation.

Python Solution

import pandas as pd

class Solution:
    def getDataFrameSize(self, players: pd.DataFrame) -> list[int]:
        # Access the shape attribute and convert to a list
        return list(players.shape)

The Python implementation directly accesses players.shape, which returns a tuple (rows, columns). Converting it to a list satisfies the output format requirement [rows, columns]. This solution is concise and leverages the efficient internal representation of DataFrame dimensions.

Go Solution

package main

type DataFrame struct {
    Data [][]interface{}
}

func getDataFrameSize(players DataFrame) []int {
    rows := len(players.Data)
    columns := 0
    if rows > 0 {
        columns = len(players.Data[0])
    }
    return []int{rows, columns}
}

In Go, since there is no built-in DataFrame type, we assume DataFrame is represented as a 2D slice [][]interface{}. We compute the number of rows using len(players.Data). For columns, we check if there is at least one row to avoid index out-of-range errors, then compute len(players.Data[0]). This handles empty DataFrames safely.

Worked Examples

Using the provided example:

+-----------+----------+-----+-------------+--------------------+
| player_id | name     | age | position    | team               |
+-----------+----------+-----+-------------+--------------------+
| 846       | Mason    | 21  | Forward     | RealMadrid         |
| 749       | Riley    | 30  | Winger      | Barcelona          |
| 155       | Bob      | 28  | Striker     | ManchesterUnited   |
| 583       | Isabella | 32  | Goalkeeper  | Liverpool          |
| 388       | Zachary  | 24  | Midfielder  | BayernMunich       |
| 883       | Ava      | 23  | Defender    | Chelsea            |
| 355       | Violet   | 18  | Striker     | Juventus           |
| 247       | Thomas   | 27  | Striker     | ParisSaint-Germain |
| 761       | Jack     | 33  | Midfielder  | ManchesterCity     |
| 642       | Charlie  | 36  | Center-back | Arsenal            |
+-----------+----------+-----+-------------+--------------------+

Step 1: players.shape returns (10, 5).

Step 2: Convert tuple to list [10, 5].

Step 3: Return [10, 5].

Complexity Analysis

Measure Complexity Explanation
Time O(1) Accessing the shape attribute is constant time; no iteration is needed.
Space O(1) Only a small list of two integers is created for output.

This constant time and space complexity is optimal for this problem, as it directly queries the metadata of the DataFrame.

Test Cases

import pandas as pd

# Basic example
df1 = pd.DataFrame({
    "player_id": [846, 749],
    "name": ["Mason", "Riley"],
    "age": [21, 30],
    "position": ["Forward", "Winger"],
    "team": ["RealMadrid", "Barcelona"]
})
assert Solution().getDataFrameSize(df1) == [2, 5]  # standard case

# Empty DataFrame
df2 = pd.DataFrame(columns=["player_id", "name"])
assert Solution().getDataFrameSize(df2) == [0, 2]  # no rows

# Single row, single column
df3 = pd.DataFrame({"player_id": [123]})
assert Solution().getDataFrameSize(df3) == [1, 1]  # minimal size

# Large DataFrame
df4 = pd.DataFrame([[i]*50 for i in range(1000)])
assert Solution().getDataFrameSize(df4) == [1000, 50]  # stress test

# DataFrame with no columns
df5 = pd.DataFrame([])
assert Solution().getDataFrameSize(df5) == [0, 0]  # empty shape
Test Why
2x5 DataFrame Standard input from problem example
Empty DataFrame Validates handling of zero rows
Single row/column Validates minimal non-empty DataFrame
Large DataFrame Ensures scaling and performance correctness
No columns Edge case of zero columns handled correctly

Edge Cases

One important edge case is an empty DataFrame with no rows but existing columns. This could lead to naive implementations iterating over rows failing to count correctly. Our solution correctly returns [0, num_columns] because .shape accurately captures zero rows.

Another edge case is a DataFrame with no columns. Some naive implementations might try to access players[0] or similar indexing, causing an error. The .shape attribute returns (num_rows, 0) safely, so the implementation is robust.

A third edge case is a DataFrame with only one row or one column. Although trivial, it tests that the algorithm handles minimal sizes without off-by-one errors. Using .shape ensures these counts are correct regardless of the specific number of rows or columns.