LeetCode 1113 - Reported Posts

The problem is asking us to determine the number of posts that were reported yesterday grouped by the reason for the report.

LeetCode Problem 1113

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem is asking us to determine the number of posts that were reported yesterday grouped by the reason for the report. We are given an Actions table where each row represents a user performing an action on a post, with the action type recorded in the action column and optional extra information in the extra column. In this context, a report action may include a reason in extra such as "spam" or "racism". The problem specifies that "today" is 2019-07-05, so "yesterday" refers to 2019-07-04.

The input table may have multiple rows for the same post, same user, or same action, meaning duplicates exist. We only care about report actions and need to count how many reports were made for each report_reason on 2019-07-04. The output should be a table with two columns: report_reason and report_count, including only reasons with at least one report.

Important points and constraints include: the table may have duplicates, there can be multiple reports of the same post for the same reason, and the extra column may be NULL for actions other than report. Edge cases to consider include posts reported multiple times by the same user for the same reason, days with no reports, and multiple report reasons.

Approaches

The brute-force approach is to scan the entire Actions table, filter rows where action = 'report' and action_date = '2019-07-04', and manually aggregate counts by reason, for example using a Python dictionary or SQL group by. This approach works correctly, but if the table is very large, scanning every row and then processing in application code is less efficient than letting the database handle grouping.

The optimal approach leverages SQL aggregation: filter directly in SQL for report actions on the target date, group by extra (the reason), and count the number of rows per group. This avoids manual aggregation, reduces memory usage, and efficiently uses database indexing on action_date or action. The key insight is that SQL's GROUP BY with COUNT naturally handles duplicates and aggregation correctly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(r) Iterate all rows in the table and count manually per reason
Optimal O(n) O(r) SQL GROUP BY handles filtering and counting efficiently, where n is the number of rows, r is the number of unique reasons

Algorithm Walkthrough

  1. Identify the target date, which is yesterday relative to the fixed date 2019-07-05. This is 2019-07-04.
  2. Filter the Actions table to select only rows where action = 'report' and action_date = '2019-07-04'.
  3. Group the filtered rows by extra, which represents the report reason.
  4. For each group, count the number of rows to get report_count.
  5. Return a table with report_reason and report_count, including only reasons with at least one report.

Why it works: Grouping by extra ensures all reports for the same reason are counted together. Filtering first ensures we only count relevant reports. Using COUNT guarantees we handle multiple reports per user or post correctly.

Python Solution

import pandas as pd

class Solution:
    def reportedPosts(self, actions: pd.DataFrame) -> pd.DataFrame:
        # Step 1: Define yesterday's date
        yesterday = '2019-07-04'
        
        # Step 2: Filter rows where action is 'report' and date is yesterday
        reports = actions[(actions['action'] == 'report') & (actions['action_date'] == yesterday)]
        
        # Step 3: Group by report reason (extra) and count
        result = reports.groupby('extra').size().reset_index(name='report_count')
        
        # Step 4: Rename column to match expected output
        result.rename(columns={'extra': 'report_reason'}, inplace=True)
        
        return result

The Python solution uses Pandas for table manipulation. First, it filters for the target date and report actions. Then, groupby('extra') aggregates counts by report reason. The resulting DataFrame is renamed to match the expected output columns. This directly implements the optimal SQL-like aggregation logic in Python.

Go Solution

package main

import (
	"database/sql"
	"fmt"
)

func reportedPosts(db *sql.DB) ([]map[string]interface{}, error) {
	yesterday := "2019-07-04"
	query := `
		SELECT extra AS report_reason, COUNT(*) AS report_count
		FROM Actions
		WHERE action = 'report' AND action_date = ?
		GROUP BY extra
	`
	rows, err := db.Query(query, yesterday)
	if err != nil {
		return nil, err
	}
	defer rows.Close()

	var results []map[string]interface{}
	for rows.Next() {
		var reason string
		var count int
		if err := rows.Scan(&reason, &count); err != nil {
			return nil, err
		}
		results = append(results, map[string]interface{}{
			"report_reason": reason,
			"report_count":  count,
		})
	}
	return results, nil
}

In Go, we execute an SQL query directly against the database. We use parameterized queries to safely filter by yesterday's date. The rows.Scan function maps SQL columns to Go variables, and we store results in a slice of maps for flexibility. Go handles SQL NULLs automatically, and COUNT(*) ensures we count duplicates correctly.

Worked Examples

Example input table:

+---------+---------+-------------+--------+--------+
| user_id | post_id | action_date | action | extra  |
+---------+---------+-------------+--------+--------+
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+

Step by step:

  1. Filter rows for action='report' and action_date='2019-07-04':
+---------+---------+-------------+--------+--------+
| 2       | 4       | 2019-07-04  | report | spam   |
| 3       | 4       | 2019-07-04  | report | spam   |
| 5       | 2       | 2019-07-04  | report | racism |
| 5       | 5       | 2019-07-04  | report | racism |
+---------+---------+-------------+--------+--------+
  1. Group by extra:
  • spam: 2 reports
  • racism: 2 reports
  1. Return table:
+---------------+--------------+
| report_reason | report_count |
+---------------+--------------+
| spam          | 2            |
| racism        | 2            |
+---------------+--------------+

Complexity Analysis

Measure Complexity Explanation
Time O(n) We scan the table once to filter reports and then group by reason
Space O(r) We store counts for each unique report reason, r is number of unique reasons

Filtering and grouping operations are linear in the number of table rows, and the space required scales with the number of distinct report reasons.

Test Cases

import pandas as pd

actions = pd.DataFrame([
    [1, 1, '2019-07-01', 'view', None],
    [1, 1, '2019-07-01', 'like', None],
    [1, 1, '2019-07-01', 'share', None],
    [2, 4, '2019-07-04', 'view', None],
    [2, 4, '2019-07-04', 'report', 'spam'],
    [3, 4, '2019-07-04', 'view', None],
    [3, 4, '2019-07-04', 'report', 'spam'],
    [4, 3, '2019-07-02', 'view', None],
    [4, 3, '2019-07-02', 'report', 'spam'],
    [5, 2, '2019-07-04', 'view', None],
    [5, 2, '2019-07-04', 'report', 'racism'],
    [5, 5, '2019-07-04', 'view', None],
    [5, 5, '2019-07-04', 'report', 'racism'],
], columns=['user_id', 'post_id', 'action_date', 'action', 'extra'])

sol = Solution()
result = sol.reportedPosts(actions)
assert set([tuple(x) for x in result.to_records(index=False)]) == set([('spam',2),('racism',2)])  # standard example

# Edge case: no reports yesterday
empty_actions = pd.DataFrame([