LeetCode 2878 - Get the Size of a DataFrame
The problem asks us to determine the size of a DataFrame named players. Specifically, it requires computing two values: the number of rows and the number of columns.
Difficulty: 🟢 Easy
Topics: —
Solution
Problem Understanding
The problem asks us to determine the size of a DataFrame named players. Specifically, it requires computing two values: the number of rows and the number of columns. In a DataFrame, rows represent individual records-in this case, players-and columns represent attributes of those records, such as player_id, name, age, position, and so on. The input is a well-formed DataFrame, so we do not need to handle malformed or missing structures. The expected output is an array [number of rows, number of columns].
Key points to note are that the DataFrame may contain any number of rows or columns, including zero, so edge cases could include an empty DataFrame (no rows) or a DataFrame with no columns. The problem guarantees that the input is a DataFrame object, so we do not need to validate types or structure.
Approaches
A naive approach would be to manually iterate over all rows and columns to count them. For instance, one could loop through the DataFrame's rows to count them and then inspect one row to count columns. While this method would yield the correct result, it is unnecessary given that DataFrame libraries provide built-in methods to retrieve dimensions efficiently. Iterating over all rows could become inefficient for large datasets, although for the constraints of an "Easy" problem, performance is not critical.
The optimal solution leverages the built-in attributes of the DataFrame, such as .shape in Python pandas, which directly provides a tuple (number_of_rows, number_of_columns). This approach is simple, highly efficient, and avoids unnecessary iteration.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(1) | Iterate all rows to count them, inspect one row for columns |
| Optimal | O(1) | O(1) | Use DataFrame's built-in .shape attribute |
Algorithm Walkthrough
- Access the
shapeattribute of theplayersDataFrame. This attribute returns a tuple(rows, columns)whererowsis the number of rows andcolumnsis the number of columns. - Convert the tuple into a list to match the expected return format
[number of rows, number of columns]. - Return the resulting list as the output.
Why it works: The .shape attribute is guaranteed by the DataFrame implementation to accurately reflect the dimensions of the DataFrame, including edge cases like empty DataFrames or DataFrames with zero columns. There is no need for iteration or additional computation.
Python Solution
import pandas as pd
class Solution:
def getDataFrameSize(self, players: pd.DataFrame) -> list[int]:
# Access the shape attribute and convert to a list
return list(players.shape)
The Python implementation directly accesses players.shape, which returns a tuple (rows, columns). Converting it to a list satisfies the output format requirement [rows, columns]. This solution is concise and leverages the efficient internal representation of DataFrame dimensions.
Go Solution
package main
type DataFrame struct {
Data [][]interface{}
}
func getDataFrameSize(players DataFrame) []int {
rows := len(players.Data)
columns := 0
if rows > 0 {
columns = len(players.Data[0])
}
return []int{rows, columns}
}
In Go, since there is no built-in DataFrame type, we assume DataFrame is represented as a 2D slice [][]interface{}. We compute the number of rows using len(players.Data). For columns, we check if there is at least one row to avoid index out-of-range errors, then compute len(players.Data[0]). This handles empty DataFrames safely.
Worked Examples
Using the provided example:
+-----------+----------+-----+-------------+--------------------+
| player_id | name | age | position | team |
+-----------+----------+-----+-------------+--------------------+
| 846 | Mason | 21 | Forward | RealMadrid |
| 749 | Riley | 30 | Winger | Barcelona |
| 155 | Bob | 28 | Striker | ManchesterUnited |
| 583 | Isabella | 32 | Goalkeeper | Liverpool |
| 388 | Zachary | 24 | Midfielder | BayernMunich |
| 883 | Ava | 23 | Defender | Chelsea |
| 355 | Violet | 18 | Striker | Juventus |
| 247 | Thomas | 27 | Striker | ParisSaint-Germain |
| 761 | Jack | 33 | Midfielder | ManchesterCity |
| 642 | Charlie | 36 | Center-back | Arsenal |
+-----------+----------+-----+-------------+--------------------+
Step 1: players.shape returns (10, 5).
Step 2: Convert tuple to list [10, 5].
Step 3: Return [10, 5].
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(1) | Accessing the shape attribute is constant time; no iteration is needed. |
| Space | O(1) | Only a small list of two integers is created for output. |
This constant time and space complexity is optimal for this problem, as it directly queries the metadata of the DataFrame.
Test Cases
import pandas as pd
# Basic example
df1 = pd.DataFrame({
"player_id": [846, 749],
"name": ["Mason", "Riley"],
"age": [21, 30],
"position": ["Forward", "Winger"],
"team": ["RealMadrid", "Barcelona"]
})
assert Solution().getDataFrameSize(df1) == [2, 5] # standard case
# Empty DataFrame
df2 = pd.DataFrame(columns=["player_id", "name"])
assert Solution().getDataFrameSize(df2) == [0, 2] # no rows
# Single row, single column
df3 = pd.DataFrame({"player_id": [123]})
assert Solution().getDataFrameSize(df3) == [1, 1] # minimal size
# Large DataFrame
df4 = pd.DataFrame([[i]*50 for i in range(1000)])
assert Solution().getDataFrameSize(df4) == [1000, 50] # stress test
# DataFrame with no columns
df5 = pd.DataFrame([])
assert Solution().getDataFrameSize(df5) == [0, 0] # empty shape
| Test | Why |
|---|---|
| 2x5 DataFrame | Standard input from problem example |
| Empty DataFrame | Validates handling of zero rows |
| Single row/column | Validates minimal non-empty DataFrame |
| Large DataFrame | Ensures scaling and performance correctness |
| No columns | Edge case of zero columns handled correctly |
Edge Cases
One important edge case is an empty DataFrame with no rows but existing columns. This could lead to naive implementations iterating over rows failing to count correctly. Our solution correctly returns [0, num_columns] because .shape accurately captures zero rows.
Another edge case is a DataFrame with no columns. Some naive implementations might try to access players[0] or similar indexing, causing an error. The .shape attribute returns (num_rows, 0) safely, so the implementation is robust.
A third edge case is a DataFrame with only one row or one column. Although trivial, it tests that the algorithm handles minimal sizes without off-by-one errors. Using .shape ensures these counts are correct regardless of the specific number of rows or columns.