LeetCode 3089 - Find Bursty Behavior

The problem asks us to identify users who demonstrate "bursty behavior" in their posting patterns during February 2024.

LeetCode Problem 3089

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem asks us to identify users who demonstrate "bursty behavior" in their posting patterns during February 2024. Specifically, a user exhibits bursty behavior if, within any consecutive 7-day period, the number of posts they make is at least twice their average weekly posting frequency in February 2024.

The input is a Posts table with columns post_id, user_id, and post_date. Each row represents a single post made by a user on a specific date. February 2024 is treated as exactly 28 days, divided into 4 weeks. The output is a table of user_id, max_7day_posts (the maximum number of posts within any 7-day window), and avg_weekly_posts (the average number of posts per week across February). Only users who satisfy the bursty behavior condition should be included, and the results are sorted by user_id.

Important constraints include the fixed length of February (28 days), which simplifies the calculation of average weekly posts. Edge cases to watch for include users with zero posts, users with only one post, or users who post consistently but never exceed twice their weekly average within any 7-day window.

Approaches

The brute-force approach would be to examine every possible 7-day window in February for each user, counting posts in each window and comparing it to twice the user's average weekly posts. While this guarantees correctness, it is inefficient because there are 22 possible 7-day windows per user (28 - 7 + 1) and potentially thousands of users, leading to excessive computations.

The optimal approach leverages windowed aggregation. First, compute the total number of posts per user in February and derive the average weekly posts. Next, generate all possible 7-day periods for February and count the posts per user in each period. Using these aggregates, we can efficiently compute the maximum 7-day post count per user and filter only those meeting the bursty behavior criterion. This approach avoids scanning each 7-day window separately for each post by using a combination of date sequences, joins, and group aggregations.

Approach Time Complexity Space Complexity Notes
Brute Force O(U * W * P) O(U * P) Examine every 7-day window for each user and count posts. U = number of users, W = number of 7-day windows, P = number of posts. Inefficient for large datasets.
Optimal O(U * P + U * W) O(U * W) Precompute weekly averages and sliding 7-day windows using SQL aggregates. Efficient for realistic dataset sizes.

Algorithm Walkthrough

  1. Filter posts to include only dates in February 2024. This ensures that we are only considering posts relevant to the analysis.
  2. Compute each user’s total post count in February. Divide by 4 to calculate avg_weekly_posts since February is treated as exactly 4 weeks.
  3. Generate a list of all possible 7-day windows in February 2024. Each window starts on February 1 through February 22.
  4. For each user, count the number of posts in each 7-day window. This requires joining the user posts with the generated windows and counting posts that fall within the window.
  5. Compute the maximum post count (max_7day_posts) for each user across all 7-day windows.
  6. Filter users where max_7day_posts is at least twice their avg_weekly_posts.
  7. Return the user_id, max_7day_posts, and avg_weekly_posts, ordering by user_id.

Why it works: By generating all 7-day windows and computing the maximum post count per user, we capture every possible period in which bursty behavior could occur. Comparing this maximum to twice the weekly average ensures the correct detection of bursty behavior.

Python Solution

import pandas as pd
from datetime import timedelta

def find_bursty_behavior(posts: pd.DataFrame) -> pd.DataFrame:
    # Filter for February 2024
    posts['post_date'] = pd.to_datetime(posts['post_date'])
    feb_posts = posts[(posts['post_date'] >= '2024-02-01') & (posts['post_date'] <= '2024-02-28')]

    # Compute total posts per user and avg weekly posts
    user_totals = feb_posts.groupby('user_id').size().reset_index(name='total_posts')
    user_totals['avg_weekly_posts'] = user_totals['total_posts'] / 4

    # Generate 7-day windows
    windows = pd.DataFrame({'start_date': pd.date_range('2024-02-01', '2024-02-22')})
    results = []

    for user_id, group in feb_posts.groupby('user_id'):
        max_7day = 0
        post_dates = group['post_date'].tolist()
        for start in windows['start_date']:
            end = start + timedelta(days=6)
            count = sum(1 for d in post_dates if start <= d <= end)
            max_7day = max(max_7day, count)
        avg_weekly = user_totals.loc[user_totals['user_id'] == user_id, 'avg_weekly_posts'].values[0]
        if max_7day >= 2 * avg_weekly:
            results.append({'user_id': user_id, 'max_7day_posts': max_7day, 'avg_weekly_posts': round(avg_weekly, 4)})

    return pd.DataFrame(results).sort_values('user_id').reset_index(drop=True)

The Python implementation first filters the posts to February 2024 and computes each user's average weekly posts. It then iterates through all possible 7-day windows, counts posts in each window, and records the maximum. Users who meet the bursty condition are included in the output.

Go Solution

package main

import (
    "database/sql"
    "fmt"
    _ "github.com/go-sql-driver/mysql"
    "time"
)

type UserBurst struct {
    UserID         int
    Max7DayPosts   int
    AvgWeeklyPosts float64
}

func FindBurstyBehavior(db *sql.DB) ([]UserBurst, error) {
    query := `
        WITH feb_posts AS (
            SELECT user_id, post_date
            FROM Posts
            WHERE post_date BETWEEN '2024-02-01' AND '2024-02-28'
        ),
        user_totals AS (
            SELECT user_id, COUNT(*) AS total_posts, COUNT(*) / 4.0 AS avg_weekly_posts
            FROM feb_posts
            GROUP BY user_id
        ),
        windows AS (
            SELECT DATE('2024-02-01') + INTERVAL n DAY AS start_date
            FROM (SELECT 0 AS n UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3
                  UNION ALL SELECT 4 UNION ALL SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7
                  UNION ALL SELECT 8 UNION ALL SELECT 9 UNION ALL SELECT 10 UNION ALL SELECT 11
                  UNION ALL SELECT 12 UNION ALL SELECT 13 UNION ALL SELECT 14 UNION ALL SELECT 15
                  UNION ALL SELECT 16 UNION ALL SELECT 17 UNION ALL SELECT 18 UNION ALL SELECT 19
                  UNION ALL SELECT 20 UNION ALL SELECT 21 UNION ALL SELECT 22) AS nums
        ),
        user_window_counts AS (
            SELECT u.user_id, w.start_date, COUNT(f.post_date) AS cnt
            FROM user_totals u
            CROSS JOIN windows w
            LEFT JOIN feb_posts f
            ON f.user_id = u.user_id AND f.post_date BETWEEN w.start_date AND w.start_date + INTERVAL 6 DAY
            GROUP BY u.user_id, w.start_date
        ),
        user_max AS (
            SELECT user_id, MAX(cnt) AS max_7day_posts
            FROM user_window_counts
            GROUP BY user_id
        )
        SELECT u.user_id, m.max_7day_posts, u.avg_weekly_posts
        FROM user_totals u
        JOIN user_max m ON u.user_id = m.user_id
        WHERE m.max_7day_posts >= 2 * u.avg_weekly_posts
        ORDER BY u.user_id;
    `

    rows, err := db.Query(query)
    if err != nil {
        return nil, err
    }
    defer rows.Close()

    var results []UserBurst
    for rows.Next() {
        var ub UserBurst
        if err := rows.Scan(&ub.UserID, &ub.Max7DayPosts, &ub.AvgWeeklyPosts); err != nil {
            return nil, err
        }
        results = append(results, ub)
    }

    return results, nil
}

The Go solution uses SQL for aggregation and window generation. The logic mirrors the Python approach but leverages SQL CTEs for efficiency. It handles slices for the results and uses strong typing for numeric results.

Worked Examples

Using the example from the problem:

Posts Table:

| post_id | user_id | post_date  |
|---------|---------|------------|
| 1       | 1       | 2024-02-27 |
| 2       | 5       | 2024-02-06 |
| 3       | 3       | 2024-02-25 |
| 4       | 3       | 2024-02-14 |
| 5       | 3       | 2024-02-06 |
| 6