LeetCode 3087 - Find Trending Hashtags
The problem asks us to find the top 3 trending hashtags from a table of tweets for a specific month, February 2024. Each tweet contains exactly one hashtag. The input is represented by a Tweets table with columns userid, tweetid, tweet, and tweetdate.
Difficulty: 🟡 Medium
Topics: Database
Solution
Problem Understanding
The problem asks us to find the top 3 trending hashtags from a table of tweets for a specific month, February 2024. Each tweet contains exactly one hashtag. The input is represented by a Tweets table with columns user_id, tweet_id, tweet, and tweet_date. The tweet_id is unique for each row, and tweet contains a string that may include a hashtag.
The output must be a table of the top 3 hashtags in February 2024, sorted first by the number of times each hashtag appears (hashtag_count) in descending order, and second by the hashtag text in descending lexicographical order to break ties. The result should include the hashtag itself and the total count.
Key constraints and notes are that each tweet contains exactly one hashtag, no three tweets are identical in terms of IDs, and the dataset may contain tweets outside February 2024, which should be ignored. Edge cases include ties in counts, fewer than three hashtags in the month, and tweets with hashtags containing unusual characters.
Approaches
Brute Force
A brute-force approach would iterate through all tweets, filter by February 2024, parse the hashtag from each tweet string, and maintain a frequency count for each hashtag. After counting, we would sort the hashtags by count and then lexicographically in descending order, finally selecting the top 3. While this is correct and straightforward, it may be inefficient if the dataset is very large, as it requires scanning the entire table, building a temporary data structure for counts, and performing a full sort.
Optimal
The optimal approach leverages SQL aggregation and filtering directly in the query, reducing the need for post-processing in code. By using a WHERE clause to filter the month, GROUP BY to count occurrences, and ORDER BY to sort based on count and hashtag, we can use SQL's built-in operations to efficiently compute the top 3 hashtags. Adding LIMIT 3 ensures we only return the required top 3, keeping memory and processing time minimal.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n log n) | O(n) | Count hashtags in Python or another language, then sort and pick top 3 |
| Optimal | O(n) | O(k) | SQL aggregates and sorts directly; n is number of tweets, k is unique hashtags in February |
Algorithm Walkthrough
- Filter the
Tweetstable to only include tweets withtweet_datein February 2024. This is done usingWHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'. - Extract hashtags from the
tweetcolumn. Since each tweet contains exactly one hashtag, the substring starting from#to the end of the word is the hashtag. - Group the filtered tweets by the extracted hashtag using
GROUP BY hashtag. - Count the number of occurrences for each hashtag using
COUNT(*) AS hashtag_count. - Order the results by
hashtag_countin descending order to rank hashtags by popularity. - In case of a tie in counts, order hashtags by the hashtag text itself in descending lexicographical order.
- Limit the results to the top 3 hashtags using
LIMIT 3.
Why it works: By filtering first, we ensure only relevant tweets are considered. GROUP BY and COUNT correctly aggregate the frequency of each hashtag. Sorting and limiting guarantees we retrieve the correct top 3 hashtags efficiently.
Python Solution
import pandas as pd
def find_trending_hashtags(tweets: pd.DataFrame) -> pd.DataFrame:
# Filter tweets for February 2024
feb_tweets = tweets[(tweets['tweet_date'] >= '2024-02-01') &
(tweets['tweet_date'] <= '2024-02-29')]
# Extract hashtags
feb_tweets['hashtag'] = feb_tweets['tweet'].str.extract(r'(#\w+)')
# Count occurrences of each hashtag
hashtag_counts = feb_tweets.groupby('hashtag').size().reset_index(name='hashtag_count')
# Sort by count descending, then by hashtag descending
hashtag_counts = hashtag_counts.sort_values(by=['hashtag_count', 'hashtag'], ascending=[False, False])
# Return top 3
return hashtag_counts.head(3)
The implementation first filters the input DataFrame to only include tweets in February 2024. Then it extracts hashtags using a regular expression, counts each hashtag’s occurrences with groupby and size(), sorts according to the problem’s ordering requirements, and finally returns the top 3 hashtags.
Go Solution
package main
import (
"database/sql"
_ "github.com/go-sql-driver/mysql"
)
func FindTrendingHashtags(db *sql.DB) (*sql.Rows, error) {
query := `
SELECT hashtag, COUNT(*) AS hashtag_count
FROM (
SELECT SUBSTRING(tweet, LOCATE('#', tweet)) AS hashtag
FROM Tweets
WHERE tweet_date BETWEEN '2024-02-01' AND '2024-02-29'
) AS feb_tweets
GROUP BY hashtag
ORDER BY hashtag_count DESC, hashtag DESC
LIMIT 3
`
return db.Query(query)
}
The Go implementation executes the same SQL query directly against a database. Unlike Python, we do not need in-memory DataFrames; instead, the query handles filtering, grouping, counting, and sorting. The LOCATE and SUBSTRING functions extract the hashtags, ensuring the correct aggregation.
Worked Examples
For the provided example:
Step 1: Filter February tweets → tweets 13-19.
Step 2: Extract hashtags → #HappyDay, #HappyDay, #WorkLife, #TechLife, #HappyDay, #TechLife, #Nature.
Step 3: Count occurrences → #HappyDay: 3, #TechLife: 2, #WorkLife: 1, #Nature: 1.
Step 4: Sort by count descending and hashtag descending → #HappyDay: 3, #TechLife: 2, #WorkLife: 1.
Step 5: Limit to top 3 → final output matches expected.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Single pass through all tweets for filtering and extraction; aggregation and sorting handled efficiently in SQL |
| Space | O(k) | Only stores counts for unique hashtags in February; k is number of unique hashtags |
The complexity is linear in the number of tweets, and memory usage depends on the number of distinct hashtags in the filtered month.
Test Cases
import pandas as pd
tweets = pd.DataFrame([
[135, 13, "Enjoying a great start to the day! #HappyDay", "2024-02-01"],
[136, 14, "Another #HappyDay with good vibes!", "2024-02-03"],
[137, 15, "Productivity peaks! #WorkLife", "2024-02-04"],
[138, 16, "Exploring new tech frontiers. #TechLife", "2024-02-04"],
[139, 17, "Gratitude for today's moments. #HappyDay", "2024-02-05"],
[140, 18, "Innovation drives us. #TechLife", "2024-02-07"],
[141, 19, "Connecting with nature's serenity. #Nature", "2024-02-09"],
])
result = find_trending_hashtags(tweets)
assert result.iloc[0]['hashtag'] == '#HappyDay' and result.iloc[0]['hashtag_count'] == 3 # Top hashtag
assert result.iloc[1]['hashtag'] == '#TechLife' and result.iloc[1]['hashtag_count'] == 2 # Second
assert result.iloc[2]['hashtag'] == '#WorkLife' and result.iloc[2]['hashtag_count'] == 1 # Third
# Edge: fewer than 3 hashtags
tweets_small = pd.DataFrame([
[150, 20, "#SoloTrend", "2024-02-02"]
])
result_small = find_trending_hashtags(tweets_small)
assert result_small.shape[0] == 1 and result_small.iloc[0]['hashtag'] == '#SoloTrend'
# Edge: tie in counts
tweets_tie = pd.DataFrame([
[151, 21, "#Alpha", "2024-02-01"],
[152, 22, "#Beta", "2024-02-01"]
])
result_tie = find_trending_hashtags(tweets_tie)
assert result_tie.iloc[0]['hashtag'] == '#Beta' # Descending lex order
assert result_tie.iloc[1]['hashtag'] == '#Alpha'
| Test | Why |
|---|---|
| Standard example | Validates main scenario with multiple hashtags and counts |
| Fewer than 3 hashtags | Ensures function handles cases with insufficient data |
| Tie in counts | Ensures secondary sorting by hashtag descending works correctly |
Edge Cases
Edge Case 1: Tweets outside February
Tweets from months other than February 2024 should be ignored. Filtering with tweet_date BETWEEN '2024-02-01' AND '2024-02-29' guarantees