Difficulty: 🟡 Medium
Topics: Database

Solution

LeetCode 1126 - Active Businesses

Problem Understanding

The problem gives us a database table named Events. Each row represents how many times a certain type of event occurred for a particular business.

The table contains three columns:

Column	Meaning
`business_id`	The identifier of the business
`event_type`	The category of event
`occurrences`	Number of times that event occurred

The pair (business_id, event_type) is unique, which means a business can only have one row for a given event type.

The key concept is the definition of the average activity for an event type. For every distinct event_type, we compute the average value of occurrences across all businesses that contain that event.

After computing those averages, we must determine which businesses are considered active. A business is active if it has more than one event type where its own occurrences value is strictly greater than the average for that event type.

The output should contain only the business_id values of active businesses.

For the example:

business_id	event_type	occurrences
1	reviews	7
3	reviews	3

The average for reviews is:

$$(7 + 3) / 2 = 5$$

Business 1 has 7, which is greater than 5, so this counts as one qualifying event type.

We repeat this process for every event type and count how many event categories each business exceeds the average in. If the count is greater than one, that business belongs in the result.

The problem constraints are not explicitly listed, but because this is a SQL/database problem, the intended solution should avoid unnecessary repeated scans of the table. Efficient aggregation and grouping operations are important.

Several edge cases are important:

A business may exceed the average in exactly one event type, which is not enough.
A business may have many event types but none above average.
Event averages may be fractional values, so comparisons must correctly handle decimals.
A business equal to the average does not qualify because the comparison is strictly greater.
Some event types may appear for only one business. In that case, the average equals that business's value, so it cannot exceed the average.

Approaches

Brute Force Approach

A brute force solution would repeatedly compute averages for every event type while checking every business.

For each row in the table:

Scan the entire table to compute the average for that row's event_type.
Compare the row's occurrences against the computed average.
Track how many event types each business exceeds the average in.
Return businesses whose count exceeds one.

This approach is correct because every row is compared against the true average for its event type. However, it is inefficient because the same averages are recomputed many times.

If there are n rows, and each row requires another full scan of the table, the time complexity becomes O(n²).

Optimal Approach

The key observation is that the average for each event_type only needs to be computed once.

Instead of recalculating averages repeatedly:

First compute the average occurrences for every event type.
Join those averages back with the original table.
Count how many times each business exceeds the corresponding event average.
Keep only businesses whose count is greater than one.

This transforms the repeated work into a small number of aggregation operations.

In SQL, this is naturally expressed using:

A grouped subquery or Common Table Expression (CTE) to compute averages.
A join between the averages and the original table.
A second grouping step to count qualifying event types per business.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(1) or O(n)	Recomputes averages repeatedly
Optimal	O(n)	O(k)	Computes each event average once, where `k` is the number of event types

Algorithm Walkthrough

Compute the average occurrences for every event_type.

We group rows by event_type and calculate:

$$AVG(occurrences)$$

This produces a small derived table where each event type is associated with its average activity. 2. Join the averages back to the original Events table.

For every original row, we now know:

the business's occurrences
the average occurrences for that event type

This allows direct comparison. 3. Filter rows where the business exceeds the average.

We only keep rows satisfying:

occurrences > avg_occurrences

This identifies event types where the business performs above average. 4. Group the filtered rows by business_id.

Each remaining row represents one qualifying event type. Grouping allows us to count how many qualifying event types each business has. 5. Keep only businesses with more than one qualifying event type.

We apply:

HAVING COUNT(*) > 1

These are the active businesses required by the problem.

Why it works

The algorithm works because every event type average is computed exactly once and then used consistently for all businesses participating in that event category.

After filtering rows where occurrences exceeds the corresponding average, each remaining row represents one valid qualifying event type for that business. Counting those rows per business directly gives the number of event types where the business is above average. The final condition COUNT(*) > 1 exactly matches the definition of an active business.

Python Solution

from typing import List
import pandas as pd

def active_businesses(events: pd.DataFrame) -> pd.DataFrame:
    event_averages = (
        events.groupby("event_type")["occurrences"]
        .mean()
        .reset_index(name="avg_occurrences")
    )

    merged = events.merge(event_averages, on="event_type")

    above_average = merged[
        merged["occurrences"] > merged["avg_occurrences"]
    ]

    result = (
        above_average.groupby("business_id")
        .size()
        .reset_index(name="count")
    )

    result = result[result["count"] > 1]

    return result[["business_id"]]

The implementation closely follows the algorithm.

The first section computes the average occurrences for each event type using groupby and mean. This creates a temporary table containing one row per event type.

Next, the code merges those averages back into the original dataframe. After the merge, each row contains both the business occurrence count and the average occurrence count for the corresponding event type.

The filtering step keeps only rows where the business occurrence count is strictly greater than the average.

After filtering, the code groups rows by business_id and counts how many qualifying event types each business has.

Finally, only businesses with more than one qualifying event type are returned.

Go Solution

package main

import "database/sql"

func activeBusinesses(db *sql.DB) (*sql.Rows, error) {
	query := `
	WITH event_avg AS (
		SELECT
			event_type,
			AVG(occurrences) AS avg_occurrences
		FROM Events
		GROUP BY event_type
	)
	SELECT
		e.business_id
	FROM Events e
	JOIN event_avg a
		ON e.event_type = a.event_type
	WHERE e.occurrences > a.avg_occurrences
	GROUP BY e.business_id
	HAVING COUNT(*) > 1
	`

	return db.Query(query)
}

The Go solution differs from the Python solution because LeetCode database problems are fundamentally SQL problems. In Go, the logic is embedded directly inside the SQL query.

The Common Table Expression named event_avg computes the average occurrences for each event type. The query then joins those averages with the original table, filters rows above average, groups by business, and applies the HAVING COUNT(*) > 1 condition.

Since SQL handles aggregation internally, there are no additional Go-specific data structures required.

Worked Examples

Example 1

Input table:

business_id	event_type	occurrences
1	reviews	7
3	reviews	3
1	ads	11
2	ads	7
3	ads	6
1	page views	3
2	page views	12

Step 1: Compute averages

event_type	average
reviews	5
ads	8
page views	7.5

Step 2: Compare each row against its average

business_id	event_type	occurrences	average	Above Average?
1	reviews	7	5	Yes
3	reviews	3	5	No
1	ads	11	8	Yes
2	ads	7	8	No
3	ads	6	8	No
1	page views	3	7.5	No
2	page views	12	7.5	Yes

Step 3: Count qualifying event types

business_id	qualifying_events
1	2
2	1

Step 4: Keep businesses with count greater than 1

business_id
1

Business 1 is the only active business.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each row participates in grouping and filtering operations a constant number of times
Space	O(k)	Stores averages for `k` distinct event types

The solution performs a small number of linear aggregation passes over the table. Modern SQL engines optimize grouping and joins efficiently, making this approach scalable even for large datasets.

The extra memory usage comes primarily from storing the average value for each distinct event type.

Test Cases

import pandas as pd

def run_solution(events: pd.DataFrame) -> pd.DataFrame:
    event_averages = (
        events.groupby("event_type")["occurrences"]
        .mean()
        .reset_index(name="avg_occurrences")
    )

    merged = events.merge(event_averages, on="event_type")

    above_average = merged[
        merged["occurrences"] > merged["avg_occurrences"]
    ]

    result = (
        above_average.groupby("business_id")
        .size()
        .reset_index(name="count")
    )

    result = result[result["count"] > 1]

    return result[["business_id"]]

# Example case from problem statement
events = pd.DataFrame({
    "business_id": [1, 3, 1, 2, 3, 1, 2],
    "event_type": [
        "reviews",
        "reviews",
        "ads",
        "ads",
        "ads",
        "page views",
        "page views"
    ],
    "occurrences": [7, 3, 11, 7, 6, 3, 12]
})

result = run_solution(events)
assert set(result["business_id"]) == {1}  # standard example

# No business exceeds average in more than one category
events = pd.DataFrame({
    "business_id": [1, 2],
    "event_type": ["ads", "ads"],
    "occurrences": [5, 5]
})

result = run_solution(events)
assert result.empty  # equal to average does not count

# One business exceeds average once only
events = pd.DataFrame({
    "business_id": [1, 2, 1, 2],
    "event_type": ["ads", "ads", "reviews", "reviews"],
    "occurrences": [10, 2, 3, 9]
})

result = run_solution(events)
assert result.empty  # each business qualifies only once

# Single business in each category
events = pd.DataFrame({
    "business_id": [1, 2],
    "event_type": ["ads", "reviews"],
    "occurrences": [100, 200]
})

result = run_solution(events)
assert result.empty  # average equals own value

# Multiple active businesses
events = pd.DataFrame({
    "business_id": [1, 2, 3, 1, 2, 3],
    "event_type": ["ads", "ads", "ads", "reviews", "reviews", "reviews"],
    "occurrences": [10, 1, 1, 20, 1, 1]
})

result = run_solution(events)
assert set(result["business_id"]) == {1}  # only business 1 qualifies twice

Test Summary

Test	Why
Problem example	Validates the standard scenario
Equal averages	Ensures strict comparison is handled correctly
One qualifying category	Verifies that exactly one category is insufficient
Single business per category	Ensures no self-average edge case bug
Multiple categories	Tests counting logic across event types

Edge Cases

Event occurrence equal to average

A common mistake is using >= instead of >. The problem explicitly requires occurrences to be strictly greater than the average.

For example, if an event type appears twice with values [5, 5], the average is 5, and neither business qualifies. The implementation correctly uses:

occurrences > avg_occurrences

which excludes equal values.

Event type with only one business

If an event type appears for only one business, the average equals that business's own occurrence count. Since no value can be strictly greater than itself, that event type can never contribute toward making a business active.

The implementation naturally handles this because the comparison fails automatically.

Business qualifying in exactly one category

The problem requires more than one qualifying event type. A naive implementation might incorrectly include businesses with one qualifying event.

The grouping and filtering step:

HAVING COUNT(*) > 1

ensures that only businesses exceeding the average in at least two event types are returned.

Fractional averages

Averages are not guaranteed to be integers. For example:

$$(3 + 12) / 2 = 7.5$$

An implementation using integer division could incorrectly truncate the value and produce wrong comparisons.

The SQL AVG() function and Pandas mean() function both preserve decimal precision, so the implementation handles fractional averages correctly.