LeetCode 1132 - Reported Posts II

The problem asks us to calculate the average daily percentage of posts that were removed after being reported as spam. We are given two tables: Actions and Removals.

LeetCode Problem 1132

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem asks us to calculate the average daily percentage of posts that were removed after being reported as spam. We are given two tables: Actions and Removals. The Actions table records user interactions with posts, including views, likes, shares, reports, and other reactions. The extra column contains optional information about the action, such as the reason for a report. The Removals table lists posts that have been removed and the date they were removed.

The key requirement is to focus only on reports with extra = 'spam'. For each day when at least one spam report occurred, we compute the percentage of reported posts that were eventually removed (we do not care when they were removed, just whether they appear in the Removals table). Finally, we compute the average of these daily percentages and round the result to two decimal places.

Edge cases include days where no spam reports occurred (these should be ignored), multiple reports of the same post on the same day (should count as a single post), and posts that are reported but never removed (they should count as 0% toward the day's removal percentage).

Approaches

Brute Force Approach

A naive approach would iterate over each day, extract all posts reported as spam, then check for each post if it exists in the Removals table. We would then calculate the percentage for that day and finally average all days. While this approach is correct, it involves repeated scanning of both tables for each day, which can be expensive for large datasets.

Optimal Approach

The optimal approach leverages SQL aggregation efficiently. First, we identify all unique spam-reported posts per day. Next, we use a LEFT JOIN to Removals to determine which of these posts were removed. We compute the daily percentage by counting the removed posts over total reported posts for that day. Finally, we take the average of these daily percentages. This approach avoids nested loops and repeated scanning by using aggregations and joins.

Approach Time Complexity Space Complexity Notes
Brute Force O(D * P) O(P) Iterate per day and per post, expensive for large tables
Optimal O(N) O(N) Use SQL aggregations and joins, scans each table once

Algorithm Walkthrough

  1. Filter the Actions table to include only rows where action = 'report' and extra = 'spam'. This gives all spam reports.
  2. Select distinct (action_date, post_id) pairs to remove duplicates where the same post was reported multiple times on the same day.
  3. Join this result with the Removals table on post_id using a LEFT JOIN. This allows us to identify which reported posts were removed.
  4. Group by action_date to compute, for each day, the total number of unique reported posts and the number that were removed.
  5. Calculate the daily removal percentage as (removed_posts / total_reported_posts) * 100.
  6. Take the average of all daily percentages.
  7. Round the result to 2 decimal places.

Why it works: By aggregating at the day level and considering only unique posts per day, we ensure the daily percentage is accurate. The LEFT JOIN ensures that posts that were not removed count toward the denominator but not the numerator. Averaging these daily percentages produces the required final result.

Python Solution

import sqlite3

def average_daily_percent(connection: sqlite3.Connection) -> float:
    query = """
    WITH spam_reports AS (
        SELECT DISTINCT action_date, post_id
        FROM Actions
        WHERE action = 'report' AND extra = 'spam'
    ),
    daily_counts AS (
        SELECT 
            s.action_date,
            COUNT(s.post_id) AS total_reported,
            COUNT(r.post_id) AS removed_count
        FROM spam_reports s
        LEFT JOIN Removals r
        ON s.post_id = r.post_id
        GROUP BY s.action_date
    )
    SELECT ROUND(AVG(removed_count * 100.0 / total_reported), 2) AS average_daily_percent
    FROM daily_counts;
    """
    cursor = connection.cursor()
    cursor.execute(query)
    result = cursor.fetchone()[0]
    return result

Explanation: First, spam_reports filters and deduplicates spam reports per day. daily_counts aggregates daily totals and removed counts. Finally, the outer query computes the average percentage of removed posts across all days.

Go Solution

import (
    "database/sql"
    _ "github.com/mattn/go-sqlite3"
    "fmt"
)

func AverageDailyPercent(db *sql.DB) (float64, error) {
    query := `
    WITH spam_reports AS (
        SELECT DISTINCT action_date, post_id
        FROM Actions
        WHERE action = 'report' AND extra = 'spam'
    ),
    daily_counts AS (
        SELECT 
            s.action_date,
            COUNT(s.post_id) AS total_reported,
            COUNT(r.post_id) AS removed_count
        FROM spam_reports s
        LEFT JOIN Removals r
        ON s.post_id = r.post_id
        GROUP BY s.action_date
    )
    SELECT ROUND(AVG(removed_count * 100.0 / total_reported), 2) AS average_daily_percent
    FROM daily_counts;
    `
    var result float64
    err := db.QueryRow(query).Scan(&result)
    if err != nil {
        return 0, err
    }
    return result, nil
}

Go-specific notes: Go requires explicit error handling when querying SQL. The ROUND function works the same as in Python/SQL. QueryRow().Scan() extracts the single floating-point result.

Worked Examples

Example 1:

Actions table filtered for spam reports:

2019-07-02: post 3
2019-07-04: post 2, post 4

Removals table:

post 2 -> removed
post 3 -> removed

Daily removal percentage:

2019-07-02: 1/1 = 100%
2019-07-04: 1/2 = 50%

Average daily percent = (100 + 50) / 2 = 75.00

Complexity Analysis

Measure Complexity Explanation
Time O(N) Each table scanned once, joins and aggregations are linear in table size
Space O(N) Storing intermediate daily counts for unique posts

The complexity is dominated by the size of the Actions table, as each row may need to be considered for filtering and aggregation.

Test Cases

# Basic example from problem
assert average_daily_percent(connection) == 75.00  # average of 100% and 50%

# No spam reports
assert average_daily_percent(connection_no_reports) == None  # or 0 depending on SQL

# All spam reports removed
assert average_daily_percent(connection_all_removed) == 100.00

# No posts removed
assert average_daily_percent(connection_none_removed) == 0.00

# Multiple reports per post per day
assert average_daily_percent(connection_duplicate_reports) == 50.00  # count unique posts
Test Why
Basic example Validates calculation against problem statement
No spam reports Validates handling of empty sets
All spam removed Checks upper bound of 100%
None removed Checks lower bound of 0%
Duplicate reports Ensures duplicates do not inflate counts

Edge Cases

Edge Case 1: Multiple reports per post on the same day

A post may be reported multiple times on the same day. Without deduplication, the total reported count would be inflated, causing the daily percentage to be lower than it should be. Using SELECT DISTINCT ensures each post counts only once per day.

Edge Case 2: Days with no spam reports

Days without spam reports must be ignored in computing the average. Otherwise, they would artificially reduce the average. Our daily_counts CTE automatically ignores days with no entries.

Edge Case 3: Posts reported as spam but never removed

Posts that are never removed should contribute to the denominator but not the numerator. Using a LEFT JOIN ensures that removed posts are counted while non-removed posts do not contribute to removed_count.

This solution correctly handles all specified edge cases, ensuring accurate computation of the daily and overall percentages.