LeetCode 3087 - Find Trending Hashtags

The problem asks us to find the top 3 trending hashtags from a table of tweets for a specific month, February 2024. Each tweet contains exactly one hashtag. The input is represented by a Tweets table with columns userid, tweetid, tweet, and tweetdate.

LeetCode Problem 3087

Difficulty: 🟡 Medium
Topics: Database

Solution

Problem Understanding

The problem asks us to find the top 3 trending hashtags from a table of tweets for a specific month, February 2024. Each tweet contains exactly one hashtag. The input is represented by a Tweets table with columns user_id, tweet_id, tweet, and tweet_date. The tweet_id is unique for each row, and tweet contains a string that may include a hashtag.

The output must be a table of the top 3 hashtags in February 2024, sorted first by the number of times each hashtag appears (hashtag_count) in descending order, and second by the hashtag text in descending lexicographical order to break ties. The result should include the hashtag itself and the total count.

Key constraints and notes are that each tweet contains exactly one hashtag, no three tweets are identical in terms of IDs, and the dataset may contain tweets outside February 2024, which should be ignored. Edge cases include ties in counts, fewer than three hashtags in the month, and tweets with hashtags containing unusual characters.

Approaches

Brute Force

A brute-force approach would iterate through all tweets, filter by February 2024, parse the hashtag from each tweet string, and maintain a frequency count for each hashtag. After counting, we would sort the hashtags by count and then lexicographically in descending order, finally selecting the top 3. While this is correct and straightforward, it may be inefficient if the dataset is very large, as it requires scanning the entire table, building a temporary data structure for counts, and performing a full sort.

Optimal

The optimal approach leverages SQL aggregation and filtering directly in the query, reducing the need for post-processing in code. By using a WHERE clause to filter the month, GROUP BY to count occurrences, and ORDER BY to sort based on count and hashtag, we can use SQL's built-in operations to efficiently compute the top 3 hashtags. Adding LIMIT 3 ensures we only return the required top 3, keeping memory and processing time minimal.

Approach Time Complexity Space Complexity Notes
Brute Force O(n log n) O(n) Count hashtags in Python or another language, then sort and pick top 3
Optimal O(n) O(k) SQL aggregates and sorts directly; n is number of tweets, k is unique hashtags in February

Algorithm Walkthrough

  1. Filter the Tweets table to only include tweets with tweet_date in February 2024. This is done using WHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'.
  2. Extract hashtags from the tweet column. Since each tweet contains exactly one hashtag, the substring starting from # to the end of the word is the hashtag.
  3. Group the filtered tweets by the extracted hashtag using GROUP BY hashtag.
  4. Count the number of occurrences for each hashtag using COUNT(*) AS hashtag_count.
  5. Order the results by hashtag_count in descending order to rank hashtags by popularity.
  6. In case of a tie in counts, order hashtags by the hashtag text itself in descending lexicographical order.
  7. Limit the results to the top 3 hashtags using LIMIT 3.

Why it works: By filtering first, we ensure only relevant tweets are considered. GROUP BY and COUNT correctly aggregate the frequency of each hashtag. Sorting and limiting guarantees we retrieve the correct top 3 hashtags efficiently.

Python Solution

import pandas as pd

def find_trending_hashtags(tweets: pd.DataFrame) -> pd.DataFrame:
    # Filter tweets for February 2024
    feb_tweets = tweets[(tweets['tweet_date'] >= '2024-02-01') & 
                        (tweets['tweet_date'] <= '2024-02-29')]
    
    # Extract hashtags
    feb_tweets['hashtag'] = feb_tweets['tweet'].str.extract(r'(#\w+)')
    
    # Count occurrences of each hashtag
    hashtag_counts = feb_tweets.groupby('hashtag').size().reset_index(name='hashtag_count')
    
    # Sort by count descending, then by hashtag descending
    hashtag_counts = hashtag_counts.sort_values(by=['hashtag_count', 'hashtag'], ascending=[False, False])
    
    # Return top 3
    return hashtag_counts.head(3)

The implementation first filters the input DataFrame to only include tweets in February 2024. Then it extracts hashtags using a regular expression, counts each hashtag’s occurrences with groupby and size(), sorts according to the problem’s ordering requirements, and finally returns the top 3 hashtags.

Go Solution

package main

import (
    "database/sql"
    _ "github.com/go-sql-driver/mysql"
)

func FindTrendingHashtags(db *sql.DB) (*sql.Rows, error) {
    query := `
        SELECT hashtag, COUNT(*) AS hashtag_count
        FROM (
            SELECT SUBSTRING(tweet, LOCATE('#', tweet)) AS hashtag
            FROM Tweets
            WHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'
        ) AS feb_tweets
        GROUP BY hashtag
        ORDER BY hashtag_count DESC, hashtag DESC
        LIMIT 3
    `
    return db.Query(query)
}

The Go implementation executes the same SQL query directly against a database. Unlike Python, we do not need in-memory DataFrames; instead, the query handles filtering, grouping, counting, and sorting. The LOCATE and SUBSTRING functions extract the hashtags, ensuring the correct aggregation.

Worked Examples

For the provided example:

Step 1: Filter February tweets → tweets 13-19.

Step 2: Extract hashtags → #HappyDay, #HappyDay, #WorkLife, #TechLife, #HappyDay, #TechLife, #Nature.

Step 3: Count occurrences → #HappyDay: 3, #TechLife: 2, #WorkLife: 1, #Nature: 1.

Step 4: Sort by count descending and hashtag descending → #HappyDay: 3, #TechLife: 2, #WorkLife: 1.

Step 5: Limit to top 3 → final output matches expected.

Complexity Analysis

Measure Complexity Explanation
Time O(n) Single pass through all tweets for filtering and extraction; aggregation and sorting handled efficiently in SQL
Space O(k) Only stores counts for unique hashtags in February; k is number of unique hashtags

The complexity is linear in the number of tweets, and memory usage depends on the number of distinct hashtags in the filtered month.

Test Cases

import pandas as pd

tweets = pd.DataFrame([
    [135, 13, "Enjoying a great start to the day! #HappyDay", "2024-02-01"],
    [136, 14, "Another #HappyDay with good vibes!", "2024-02-03"],
    [137, 15, "Productivity peaks! #WorkLife", "2024-02-04"],
    [138, 16, "Exploring new tech frontiers. #TechLife", "2024-02-04"],
    [139, 17, "Gratitude for today's moments. #HappyDay", "2024-02-05"],
    [140, 18, "Innovation drives us. #TechLife", "2024-02-07"],
    [141, 19, "Connecting with nature's serenity. #Nature", "2024-02-09"],
])

result = find_trending_hashtags(tweets)
assert result.iloc[0]['hashtag'] == '#HappyDay' and result.iloc[0]['hashtag_count'] == 3  # Top hashtag
assert result.iloc[1]['hashtag'] == '#TechLife' and result.iloc[1]['hashtag_count'] == 2 # Second
assert result.iloc[2]['hashtag'] == '#WorkLife' and result.iloc[2]['hashtag_count'] == 1 # Third

# Edge: fewer than 3 hashtags
tweets_small = pd.DataFrame([
    [150, 20, "#SoloTrend", "2024-02-02"]
])
result_small = find_trending_hashtags(tweets_small)
assert result_small.shape[0] == 1 and result_small.iloc[0]['hashtag'] == '#SoloTrend'

# Edge: tie in counts
tweets_tie = pd.DataFrame([
    [151, 21, "#Alpha", "2024-02-01"],
    [152, 22, "#Beta", "2024-02-01"]
])
result_tie = find_trending_hashtags(tweets_tie)
assert result_tie.iloc[0]['hashtag'] == '#Beta'  # Descending lex order
assert result_tie.iloc[1]['hashtag'] == '#Alpha'
Test Why
Standard example Validates main scenario with multiple hashtags and counts
Fewer than 3 hashtags Ensures function handles cases with insufficient data
Tie in counts Ensures secondary sorting by hashtag descending works correctly

Edge Cases

Edge Case 1: Tweets outside February

Tweets from months other than February 2024 should be ignored. Filtering with tweet_date BETWEEN '2024-02-01' AND '2024-02-29' guarantees