LeetCode 1126 - Active Businesses
The problem gives us a database table named Events. Each row represents how many times a certain type of event occurred for a particular business.
Difficulty: 🟡 Medium
Topics: Database
Solution
LeetCode 1126 - Active Businesses
Problem Understanding
The problem gives us a database table named Events. Each row represents how many times a certain type of event occurred for a particular business.
The table contains three columns:
| Column | Meaning |
|---|---|
business_id |
The identifier of the business |
event_type |
The category of event |
occurrences |
Number of times that event occurred |
The pair (business_id, event_type) is unique, which means a business can only have one row for a given event type.
The key concept is the definition of the average activity for an event type. For every distinct event_type, we compute the average value of occurrences across all businesses that contain that event.
After computing those averages, we must determine which businesses are considered active. A business is active if it has more than one event type where its own occurrences value is strictly greater than the average for that event type.
The output should contain only the business_id values of active businesses.
For the example:
| business_id | event_type | occurrences |
|---|---|---|
| 1 | reviews | 7 |
| 3 | reviews | 3 |
The average for reviews is:
$$(7 + 3) / 2 = 5$$
Business 1 has 7, which is greater than 5, so this counts as one qualifying event type.
We repeat this process for every event type and count how many event categories each business exceeds the average in. If the count is greater than one, that business belongs in the result.
The problem constraints are not explicitly listed, but because this is a SQL/database problem, the intended solution should avoid unnecessary repeated scans of the table. Efficient aggregation and grouping operations are important.
Several edge cases are important:
- A business may exceed the average in exactly one event type, which is not enough.
- A business may have many event types but none above average.
- Event averages may be fractional values, so comparisons must correctly handle decimals.
- A business equal to the average does not qualify because the comparison is strictly greater.
- Some event types may appear for only one business. In that case, the average equals that business's value, so it cannot exceed the average.
Approaches
Brute Force Approach
A brute force solution would repeatedly compute averages for every event type while checking every business.
For each row in the table:
- Scan the entire table to compute the average for that row's
event_type. - Compare the row's
occurrencesagainst the computed average. - Track how many event types each business exceeds the average in.
- Return businesses whose count exceeds one.
This approach is correct because every row is compared against the true average for its event type. However, it is inefficient because the same averages are recomputed many times.
If there are n rows, and each row requires another full scan of the table, the time complexity becomes O(n²).
Optimal Approach
The key observation is that the average for each event_type only needs to be computed once.
Instead of recalculating averages repeatedly:
- First compute the average occurrences for every event type.
- Join those averages back with the original table.
- Count how many times each business exceeds the corresponding event average.
- Keep only businesses whose count is greater than one.
This transforms the repeated work into a small number of aggregation operations.
In SQL, this is naturally expressed using:
- A grouped subquery or Common Table Expression (CTE) to compute averages.
- A join between the averages and the original table.
- A second grouping step to count qualifying event types per business.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(1) or O(n) | Recomputes averages repeatedly |
| Optimal | O(n) | O(k) | Computes each event average once, where k is the number of event types |
Algorithm Walkthrough
- Compute the average occurrences for every
event_type.
We group rows by event_type and calculate:
$$AVG(occurrences)$$
This produces a small derived table where each event type is associated with its average activity.
2. Join the averages back to the original Events table.
For every original row, we now know:
- the business's occurrences
- the average occurrences for that event type
This allows direct comparison. 3. Filter rows where the business exceeds the average.
We only keep rows satisfying:
occurrences > avg_occurrences
This identifies event types where the business performs above average.
4. Group the filtered rows by business_id.
Each remaining row represents one qualifying event type. Grouping allows us to count how many qualifying event types each business has. 5. Keep only businesses with more than one qualifying event type.
We apply:
HAVING COUNT(*) > 1
These are the active businesses required by the problem.
Why it works
The algorithm works because every event type average is computed exactly once and then used consistently for all businesses participating in that event category.
After filtering rows where occurrences exceeds the corresponding average, each remaining row represents one valid qualifying event type for that business. Counting those rows per business directly gives the number of event types where the business is above average. The final condition COUNT(*) > 1 exactly matches the definition of an active business.
Python Solution
from typing import List
import pandas as pd
def active_businesses(events: pd.DataFrame) -> pd.DataFrame:
event_averages = (
events.groupby("event_type")["occurrences"]
.mean()
.reset_index(name="avg_occurrences")
)
merged = events.merge(event_averages, on="event_type")
above_average = merged[
merged["occurrences"] > merged["avg_occurrences"]
]
result = (
above_average.groupby("business_id")
.size()
.reset_index(name="count")
)
result = result[result["count"] > 1]
return result[["business_id"]]
The implementation closely follows the algorithm.
The first section computes the average occurrences for each event type using groupby and mean. This creates a temporary table containing one row per event type.
Next, the code merges those averages back into the original dataframe. After the merge, each row contains both the business occurrence count and the average occurrence count for the corresponding event type.
The filtering step keeps only rows where the business occurrence count is strictly greater than the average.
After filtering, the code groups rows by business_id and counts how many qualifying event types each business has.
Finally, only businesses with more than one qualifying event type are returned.
Go Solution
package main
import "database/sql"
func activeBusinesses(db *sql.DB) (*sql.Rows, error) {
query := `
WITH event_avg AS (
SELECT
event_type,
AVG(occurrences) AS avg_occurrences
FROM Events
GROUP BY event_type
)
SELECT
e.business_id
FROM Events e
JOIN event_avg a
ON e.event_type = a.event_type
WHERE e.occurrences > a.avg_occurrences
GROUP BY e.business_id
HAVING COUNT(*) > 1
`
return db.Query(query)
}
The Go solution differs from the Python solution because LeetCode database problems are fundamentally SQL problems. In Go, the logic is embedded directly inside the SQL query.
The Common Table Expression named event_avg computes the average occurrences for each event type. The query then joins those averages with the original table, filters rows above average, groups by business, and applies the HAVING COUNT(*) > 1 condition.
Since SQL handles aggregation internally, there are no additional Go-specific data structures required.
Worked Examples
Example 1
Input table:
| business_id | event_type | occurrences |
|---|---|---|
| 1 | reviews | 7 |
| 3 | reviews | 3 |
| 1 | ads | 11 |
| 2 | ads | 7 |
| 3 | ads | 6 |
| 1 | page views | 3 |
| 2 | page views | 12 |
Step 1: Compute averages
| event_type | average |
|---|---|
| reviews | 5 |
| ads | 8 |
| page views | 7.5 |
Step 2: Compare each row against its average
| business_id | event_type | occurrences | average | Above Average? |
|---|---|---|---|---|
| 1 | reviews | 7 | 5 | Yes |
| 3 | reviews | 3 | 5 | No |
| 1 | ads | 11 | 8 | Yes |
| 2 | ads | 7 | 8 | No |
| 3 | ads | 6 | 8 | No |
| 1 | page views | 3 | 7.5 | No |
| 2 | page views | 12 | 7.5 | Yes |
Step 3: Count qualifying event types
| business_id | qualifying_events |
|---|---|
| 1 | 2 |
| 2 | 1 |
Step 4: Keep businesses with count greater than 1
| business_id |
|---|
| 1 |
Business 1 is the only active business.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each row participates in grouping and filtering operations a constant number of times |
| Space | O(k) | Stores averages for k distinct event types |
The solution performs a small number of linear aggregation passes over the table. Modern SQL engines optimize grouping and joins efficiently, making this approach scalable even for large datasets.
The extra memory usage comes primarily from storing the average value for each distinct event type.
Test Cases
import pandas as pd
def run_solution(events: pd.DataFrame) -> pd.DataFrame:
event_averages = (
events.groupby("event_type")["occurrences"]
.mean()
.reset_index(name="avg_occurrences")
)
merged = events.merge(event_averages, on="event_type")
above_average = merged[
merged["occurrences"] > merged["avg_occurrences"]
]
result = (
above_average.groupby("business_id")
.size()
.reset_index(name="count")
)
result = result[result["count"] > 1]
return result[["business_id"]]
# Example case from problem statement
events = pd.DataFrame({
"business_id": [1, 3, 1, 2, 3, 1, 2],
"event_type": [
"reviews",
"reviews",
"ads",
"ads",
"ads",
"page views",
"page views"
],
"occurrences": [7, 3, 11, 7, 6, 3, 12]
})
result = run_solution(events)
assert set(result["business_id"]) == {1} # standard example
# No business exceeds average in more than one category
events = pd.DataFrame({
"business_id": [1, 2],
"event_type": ["ads", "ads"],
"occurrences": [5, 5]
})
result = run_solution(events)
assert result.empty # equal to average does not count
# One business exceeds average once only
events = pd.DataFrame({
"business_id": [1, 2, 1, 2],
"event_type": ["ads", "ads", "reviews", "reviews"],
"occurrences": [10, 2, 3, 9]
})
result = run_solution(events)
assert result.empty # each business qualifies only once
# Single business in each category
events = pd.DataFrame({
"business_id": [1, 2],
"event_type": ["ads", "reviews"],
"occurrences": [100, 200]
})
result = run_solution(events)
assert result.empty # average equals own value
# Multiple active businesses
events = pd.DataFrame({
"business_id": [1, 2, 3, 1, 2, 3],
"event_type": ["ads", "ads", "ads", "reviews", "reviews", "reviews"],
"occurrences": [10, 1, 1, 20, 1, 1]
})
result = run_solution(events)
assert set(result["business_id"]) == {1} # only business 1 qualifies twice
Test Summary
| Test | Why |
|---|---|
| Problem example | Validates the standard scenario |
| Equal averages | Ensures strict comparison is handled correctly |
| One qualifying category | Verifies that exactly one category is insufficient |
| Single business per category | Ensures no self-average edge case bug |
| Multiple categories | Tests counting logic across event types |
Edge Cases
Event occurrence equal to average
A common mistake is using >= instead of >. The problem explicitly requires occurrences to be strictly greater than the average.
For example, if an event type appears twice with values [5, 5], the average is 5, and neither business qualifies. The implementation correctly uses:
occurrences > avg_occurrences
which excludes equal values.
Event type with only one business
If an event type appears for only one business, the average equals that business's own occurrence count. Since no value can be strictly greater than itself, that event type can never contribute toward making a business active.
The implementation naturally handles this because the comparison fails automatically.
Business qualifying in exactly one category
The problem requires more than one qualifying event type. A naive implementation might incorrectly include businesses with one qualifying event.
The grouping and filtering step:
HAVING COUNT(*) > 1
ensures that only businesses exceeding the average in at least two event types are returned.
Fractional averages
Averages are not guaranteed to be integers. For example:
$$(3 + 12) / 2 = 7.5$$
An implementation using integer division could incorrectly truncate the value and produce wrong comparisons.
The SQL AVG() function and Pandas mean() function both preserve decimal precision, so the implementation handles fractional averages correctly.