LeetCode 3103 - Find Trending Hashtags II
The problem gives us a database table named Tweets, where each row represents a tweet posted during February 2024. Every tweet contains plain text in the tweet column, and that text may contain one or more hashtags.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
The problem gives us a database table named Tweets, where each row represents a tweet posted during February 2024. Every tweet contains plain text in the tweet column, and that text may contain one or more hashtags.
A hashtag is a word that starts with the # character, such as #HappyDay or #TechLife.
Our task is to identify the top 3 trending hashtags in February 2024. Trending is defined purely by frequency, meaning we must count how many times each hashtag appears across all tweets.
The output should contain:
| Column | Meaning |
|---|---|
hashtag |
The hashtag itself |
count |
Number of times the hashtag appeared |
The final result must be ordered by:
countin descending orderhashtagin descending lexicographical order when counts are tied
The table guarantees that all tweet dates are valid dates in February 2024, so we do not need to filter outside the month. However, we still need to correctly extract hashtags from free-form text.
The key challenge is that a single tweet may contain multiple hashtags. A naive approach that treats each tweet as containing only one hashtag would fail.
Important edge cases include:
- Tweets containing multiple hashtags
- Multiple tweets containing the same hashtag
- Hashtags with identical counts, requiring secondary sorting
- Tweets with no hashtags at all
- Repeated hashtags appearing across many different tweets
The problem is fundamentally a text parsing and aggregation problem.
Approaches
Brute Force Approach
The brute-force approach would manually scan every tweet character by character, attempting to identify every hashtag occurrence. Once a # character is found, the algorithm would continue reading characters until it reaches a space or the end of the string. Every extracted hashtag would then be stored and counted.
This approach works because every hashtag is explicitly embedded in the tweet text. By examining every character, we guarantee that no hashtag is missed.
However, manually implementing parsing logic becomes unnecessarily complicated. It is easy to introduce bugs involving punctuation, multiple spaces, or malformed parsing rules. Additionally, repeated string manipulation can make the solution verbose and harder to maintain.
Optimal Approach
The better solution uses SQL string extraction functions together with recursive expansion or regular expression matching.
The key observation is that hashtags already follow a well-defined pattern:
#<non-space characters>
Instead of manually scanning characters, we can leverage SQL pattern matching and recursive extraction to isolate hashtags cleanly.
The algorithm works in three stages:
- Extract every hashtag occurrence from every tweet
- Group identical hashtags together and count frequencies
- Sort by frequency descending and hashtag descending, then return the top 3
This approach is both simpler and more efficient because databases are optimized for grouping and aggregation operations.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(T × L) | O(H) | Manually scans each tweet character by character |
| Optimal | O(T × L) | O(H) | Uses SQL extraction and aggregation functions efficiently |
Where:
T= number of tweetsL= average tweet lengthH= number of distinct hashtags
Algorithm Walkthrough
Optimal SQL Algorithm
- Start with the
Tweetstable and process every tweet individually. - Extract hashtags from each tweet text. Since a tweet may contain multiple hashtags, we need a mechanism that repeatedly finds hashtags until none remain.
- For every extracted hashtag, create a separate row. This normalization step is important because aggregation works best when each hashtag occurrence occupies its own record.
- Group rows by hashtag. This allows us to count how many times each hashtag appeared across all tweets.
- Sort the grouped results:
- Higher counts come first
- If counts are equal, lexicographically larger hashtags come first
- Return only the first 3 rows.
Why it works
The algorithm works because every hashtag occurrence is extracted exactly once and transformed into its own row. Grouping then correctly aggregates all identical hashtags together. Since the final sorting follows the exact ordering rules from the problem statement, the top 3 returned rows are guaranteed to be correct.
Python Solution
LeetCode database problems are solved using SQL rather than procedural Python code. The following SQL query is the correct submission.
# Write your MySQL query statement below
WITH RECURSIVE hashtag_extract AS (
SELECT
tweet_id,
SUBSTRING_INDEX(
SUBSTRING(tweet, LOCATE('#', tweet)),
' ',
1
) AS hashtag,
SUBSTRING(
tweet,
LOCATE('#', tweet) + 1
) AS remaining_text
FROM Tweets
WHERE tweet LIKE '%#%'
UNION ALL
SELECT
tweet_id,
SUBSTRING_INDEX(
SUBSTRING(remaining_text, LOCATE('#', remaining_text)),
' ',
1
) AS hashtag,
SUBSTRING(
remaining_text,
LOCATE('#', remaining_text) + 1
) AS remaining_text
FROM hashtag_extract
WHERE remaining_text LIKE '%#%'
)
SELECT
hashtag,
COUNT(*) AS count
FROM hashtag_extract
GROUP BY hashtag
ORDER BY count DESC, hashtag DESC
LIMIT 3;
Implementation Explanation
The solution uses a recursive common table expression named hashtag_extract.
The base query processes every tweet containing at least one hashtag. It locates the first # symbol and extracts the hashtag until the next space.
The recursive section continues searching through the remaining portion of the tweet. Each recursive iteration extracts one additional hashtag.
Eventually, all hashtags from all tweets are converted into individual rows.
After extraction is complete:
GROUP BY hashtagaggregates identical hashtagsCOUNT(*)computes frequencyORDER BY count DESC, hashtag DESCapplies the required sortingLIMIT 3returns only the top three hashtags
Go Solution
LeetCode database problems do not require Go implementations because submissions are written directly in SQL. However, the following Go code demonstrates how the same logic could be implemented procedurally.
package main
import (
"fmt"
"sort"
"strings"
)
type Tweet struct {
UserID int
TweetID int
Tweet string
TweetDate string
}
type Result struct {
Hashtag string
Count int
}
func topTrendingHashtags(tweets []Tweet) []Result {
hashtagCount := make(map[string]int)
for _, tweet := range tweets {
words := strings.Split(tweet.Tweet, " ")
for _, word := range words {
if len(word) > 0 && word[0] == '#' {
hashtagCount[word]++
}
}
}
results := make([]Result, 0)
for hashtag, count := range hashtagCount {
results = append(results, Result{
Hashtag: hashtag,
Count: count,
})
}
sort.Slice(results, func(i, j int) bool {
if results[i].Count == results[j].Count {
return results[i].Hashtag > results[j].Hashtag
}
return results[i].Count > results[j].Count
})
if len(results) > 3 {
results = results[:3]
}
return results
}
func main() {
tweets := []Tweet{
{135, 13, "Enjoying a great start to the day. #HappyDay #MorningVibes", "2024-02-01"},
{136, 14, "Another #HappyDay with good vibes! #FeelGood", "2024-02-03"},
{137, 15, "Productivity peaks! #WorkLife #ProductiveDay", "2024-02-04"},
}
results := topTrendingHashtags(tweets)
for _, r := range results {
fmt.Println(r.Hashtag, r.Count)
}
}
Go Implementation Notes
The Go implementation uses:
- A hash map for frequency counting
strings.Splitto tokenize tweet textsort.Slicefor custom sorting
Unlike SQL, Go requires explicit parsing and aggregation logic. The SQL solution delegates these operations to the database engine.
Worked Examples
Example 1
Input:
| tweet_id | tweet |
|---|---|
| 13 | Enjoying a great start to the day. #HappyDay #MorningVibes |
| 14 | Another #HappyDay with good vibes! #FeelGood |
| 15 | Productivity peaks! #WorkLife #ProductiveDay |
| 16 | Exploring new tech frontiers. #TechLife #Innovation |
| 17 | Gratitude for today's moments. #HappyDay #Thankful |
| 18 | Innovation drives us. #TechLife #FutureTech |
| 19 | Connecting with nature's serenity. #Nature #Peaceful |
Step 1: Extract hashtags
| tweet_id | Extracted hashtags |
|---|---|
| 13 | #HappyDay, #MorningVibes |
| 14 | #HappyDay, #FeelGood |
| 15 | #WorkLife, #ProductiveDay |
| 16 | #TechLife, #Innovation |
| 17 | #HappyDay, #Thankful |
| 18 | #TechLife, #FutureTech |
| 19 | #Nature, #Peaceful |
Step 2: Count frequencies
| Hashtag | Count |
|---|---|
#HappyDay |
3 |
#TechLife |
2 |
#WorkLife |
1 |
#MorningVibes |
1 |
#FeelGood |
1 |
#ProductiveDay |
1 |
#Innovation |
1 |
#Thankful |
1 |
#FutureTech |
1 |
#Nature |
1 |
#Peaceful |
1 |
Step 3: Sort results
Sorted by:
- Count descending
- Hashtag descending
Final top 3:
| hashtag | count |
|---|---|
#HappyDay |
3 |
#TechLife |
2 |
#WorkLife |
1 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(T × L) | Each tweet and character is processed at most once |
| Space | O(H) | Stores counts for distinct hashtags |
The dominant cost comes from scanning tweet text to extract hashtags. Aggregation operations are linear relative to the number of extracted hashtags.
The recursive SQL expansion still processes each hashtag occurrence exactly once, making the solution efficient for the problem constraints.
Test Cases
from collections import Counter
def solve(tweets):
counts = Counter()
for tweet in tweets:
for word in tweet.split():
if word.startswith("#"):
counts[word] += 1
return sorted(
counts.items(),
key=lambda x: (-x[1], -ord(x[0][1]))
)[:3]
# Provided example
tweets1 = [
"Enjoying a great start to the day. #HappyDay #MorningVibes",
"Another #HappyDay with good vibes! #FeelGood",
"Productivity peaks! #WorkLife #ProductiveDay",
"Exploring new tech frontiers. #TechLife #Innovation",
"Gratitude for today's moments. #HappyDay #Thankful",
"Innovation drives us. #TechLife #FutureTech",
"Connecting with nature's serenity. #Nature #Peaceful"
]
assert solve(tweets1)[0][0] == "#HappyDay" # Main example
# Single hashtag repeated many times
tweets2 = [
"#A",
"#A",
"#A"
]
assert solve(tweets2) == [("#A", 3)] # Repeated hashtag
# Multiple hashtags in one tweet
tweets3 = [
"#A #B #C"
]
result3 = solve(tweets3)
assert len(result3) == 3 # Multiple hashtags extracted
# Tie-breaking by hashtag descending
tweets4 = [
"#Apple",
"#Banana"
]
result4 = solve(tweets4)
assert result4[0][0] == "#Banana" # Descending lexicographic order
# Tweets without hashtags
tweets5 = [
"hello world",
"good morning"
]
assert solve(tweets5) == [] # No hashtags
# More than three unique hashtags
tweets6 = [
"#A #B #C #D #E"
]
assert len(solve(tweets6)) == 3 # Only top 3 returned
Test Case Summary
| Test | Why |
|---|---|
| Provided example | Validates standard functionality |
| Single repeated hashtag | Ensures counting works correctly |
| Multiple hashtags in one tweet | Verifies extraction logic |
| Tie-breaking case | Validates secondary sorting rule |
| Tweets without hashtags | Ensures empty handling |
| More than three hashtags | Confirms top-3 truncation |
Edge Cases
Tweets Without Any Hashtags
Some tweets may contain plain text without a single hashtag. A naive implementation might incorrectly attempt extraction and produce invalid rows.
The SQL solution prevents this by filtering with:
WHERE tweet LIKE '%#%'
Only tweets containing hashtags enter the recursive extraction pipeline.
Multiple Hashtags in a Single Tweet
A single tweet may contain many hashtags. An incorrect solution might extract only the first hashtag and ignore the rest.
The recursive common table expression solves this by repeatedly processing the remaining portion of the tweet until no # symbol remains.
Equal Frequency Hashtags
Several hashtags may share the same count. The problem explicitly requires sorting by hashtag descending in this situation.
The implementation correctly handles this with:
ORDER BY count DESC, hashtag DESC
This guarantees deterministic ordering even when frequencies match.