LeetCode 2881 - Create a New Column

The problem is asking us to modify a given DataFrame named employees by adding a new column called bonus. Each value in the bonus column should be exactly double the corresponding value in the salary column.

LeetCode Problem 2881

Difficulty: 🟢 Easy
Topics:

Solution

Problem Understanding

The problem is asking us to modify a given DataFrame named employees by adding a new column called bonus. Each value in the bonus column should be exactly double the corresponding value in the salary column. The input DataFrame contains at least two columns: name, which is a string representing the employee's name, and salary, which is an integer representing their salary. The expected output is the same DataFrame but with an additional bonus column where every entry is salary * 2.

Constraints are minimal since the problem is categorized as Easy. The input is assumed to be valid with all salary entries as integers. Important edge cases include an empty DataFrame, a salary of zero, or extremely large integers that could potentially cause integer overflow in some programming languages. The problem guarantees that the salary column exists and is numeric.

Approaches

The simplest, brute-force approach is to iterate through each row of the DataFrame and calculate the bonus for each employee, storing it in a new column. While this works, it is unnecessarily verbose in Python because pandas provides vectorized operations that automatically apply a function or calculation to an entire column efficiently.

The key insight for an optimal solution is to leverage pandas vectorization, which allows arithmetic operations to be performed across the entire column at once. This eliminates the need for explicit loops, making the solution concise and efficient.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(n) Loop through each row and calculate bonus individually
Optimal O(n) O(n) Use pandas vectorized operations to multiply the salary column by 2

Algorithm Walkthrough

  1. Access the salary column of the employees DataFrame. This is a pandas Series representing all salaries.
  2. Multiply the salary column by 2. Pandas performs this operation element-wise across the entire column.
  3. Assign the resulting Series to a new column named bonus in the same DataFrame.
  4. Return or display the updated DataFrame with the new column included.

Why it works: pandas ensures that all operations on a Series are applied element-wise. By multiplying the entire column by 2, we guarantee that every employee's bonus is exactly double their salary, and assigning it to a new column preserves the original data while adding the required output.

Python Solution

import pandas as pd

class Solution:
    def create_bonus_column(self, employees: pd.DataFrame) -> pd.DataFrame:
        employees['bonus'] = employees['salary'] * 2
        return employees

In this implementation, we directly use pandas column operations. Accessing employees['salary'] gives the salary column, multiplying by 2 performs vectorized arithmetic on every entry, and assignment to employees['bonus'] creates the new column in-place. This is concise and avoids explicit loops, which is optimal for performance in pandas.

Go Solution

package main

import (
    "fmt"
)

type Employee struct {
    Name   string
    Salary int
    Bonus  int
}

func CreateBonusColumn(employees []Employee) []Employee {
    for i := range employees {
        employees[i].Bonus = employees[i].Salary * 2
    }
    return employees
}

func main() {
    employees := []Employee{
        {"Piper", 4548, 0},
        {"Grace", 28150, 0},
        {"Georgia", 1103, 0},
        {"Willow", 6593, 0},
        {"Finn", 74576, 0},
        {"Thomas", 24433, 0},
    }

    updated := CreateBonusColumn(employees)
    fmt.Println(updated)
}

In Go, there is no built-in DataFrame structure. We define an Employee struct with Name, Salary, and Bonus. The function iterates over the slice of employees and sets the Bonus field for each employee. Go handles integer multiplication safely for standard int ranges, but extremely large integers could overflow, unlike pandas which uses arbitrary precision for integers.

Worked Examples

Example Input

name salary
Piper 4548
Grace 28150
Georgia 1103

Step-by-step Execution

  1. Access the salary column: [4548, 28150, 1103]
  2. Multiply each by 2: [9096, 56300, 2206]
  3. Assign to a new column bonus

Resulting DataFrame

name salary bonus
Piper 4548 9096
Grace 28150 56300
Georgia 1103 2206

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each row is processed once; vectorized operations in pandas also operate in O(n)
Space O(n) A new column is created, storing n additional integers

This is optimal for this problem as any solution must at least touch each row to compute the bonus.

Test Cases

import pandas as pd

# Provided example
df1 = pd.DataFrame({
    "name": ["Piper", "Grace", "Georgia"],
    "salary": [4548, 28150, 1103]
})
solution = Solution()
result1 = solution.create_bonus_column(df1)
assert result1['bonus'].tolist() == [9096, 56300, 2206]  # basic example

# Edge case: empty DataFrame
df2 = pd.DataFrame(columns=["name", "salary"])
result2 = solution.create_bonus_column(df2)
assert result2.empty  # handles empty input

# Edge case: salary zero
df3 = pd.DataFrame({"name": ["Alice"], "salary": [0]})
result3 = solution.create_bonus_column(df3)
assert result3['bonus'].iloc[0] == 0  # zero salary handled correctly

# Large numbers
df4 = pd.DataFrame({"name": ["Bob"], "salary": [10**9]})
result4 = solution.create_bonus_column(df4)
assert result4['bonus'].iloc[0] == 2 * 10**9  # large integers

# Multiple rows with different salaries
df5 = pd.DataFrame({
    "name": ["Eve", "Mallory", "Trent"],
    "salary": [123, 456, 789]
})
result5 = solution.create_bonus_column(df5)
assert result5['bonus'].tolist() == [246, 912, 1578]
Test Why
Basic example Validates normal behavior with multiple rows
Empty DataFrame Ensures function handles no data without errors
Salary zero Ensures multiplication by zero is handled correctly
Large numbers Checks that large integers do not cause overflow in Python
Multiple diverse salaries Confirms correctness across different numeric inputs

Edge Cases

The first edge case is an empty DataFrame. Some naive implementations might attempt to access rows without checking length, which would throw an error. The vectorized operation in pandas handles this gracefully, returning an empty DataFrame with the new column also empty.

The second edge case is a salary of zero. Multiplying zero by two is straightforward mathematically, but in languages with integer types, one might incorrectly handle falsy values. Our implementation correctly produces zero.

The third edge case is extremely large salary values. In Python, integers can grow arbitrarily large, so doubling a billion or more works as expected. In Go, using a standard int type could overflow if the numbers exceed the maximum integer size, so care must be taken if salaries could exceed this limit in production code. Our solution demonstrates awareness of this distinction.