LeetCode 3103 - Find Trending Hashtags II

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

The problem gives us a database table named Tweets, where each row represents a tweet posted during February 2024. Every tweet contains plain text in the tweet column, and that text may contain one or more hashtags.

A hashtag is a word that starts with the # character, such as #HappyDay or #TechLife.

Our task is to identify the top 3 trending hashtags in February 2024. Trending is defined purely by frequency, meaning we must count how many times each hashtag appears across all tweets.

The output should contain:

Column	Meaning
`hashtag`	The hashtag itself
`count`	Number of times the hashtag appeared

The final result must be ordered by:

count in descending order
hashtag in descending lexicographical order when counts are tied

The table guarantees that all tweet dates are valid dates in February 2024, so we do not need to filter outside the month. However, we still need to correctly extract hashtags from free-form text.

The key challenge is that a single tweet may contain multiple hashtags. A naive approach that treats each tweet as containing only one hashtag would fail.

Important edge cases include:

Tweets containing multiple hashtags
Multiple tweets containing the same hashtag
Hashtags with identical counts, requiring secondary sorting
Tweets with no hashtags at all
Repeated hashtags appearing across many different tweets

The problem is fundamentally a text parsing and aggregation problem.

Approaches

Brute Force Approach

The brute-force approach would manually scan every tweet character by character, attempting to identify every hashtag occurrence. Once a # character is found, the algorithm would continue reading characters until it reaches a space or the end of the string. Every extracted hashtag would then be stored and counted.

This approach works because every hashtag is explicitly embedded in the tweet text. By examining every character, we guarantee that no hashtag is missed.

However, manually implementing parsing logic becomes unnecessarily complicated. It is easy to introduce bugs involving punctuation, multiple spaces, or malformed parsing rules. Additionally, repeated string manipulation can make the solution verbose and harder to maintain.

Optimal Approach

The better solution uses SQL string extraction functions together with recursive expansion or regular expression matching.

The key observation is that hashtags already follow a well-defined pattern:

#<non-space characters>

Instead of manually scanning characters, we can leverage SQL pattern matching and recursive extraction to isolate hashtags cleanly.

The algorithm works in three stages:

Extract every hashtag occurrence from every tweet
Group identical hashtags together and count frequencies
Sort by frequency descending and hashtag descending, then return the top 3

This approach is both simpler and more efficient because databases are optimized for grouping and aggregation operations.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(T × L)	O(H)	Manually scans each tweet character by character
Optimal	O(T × L)	O(H)	Uses SQL extraction and aggregation functions efficiently

Where:

T = number of tweets
L = average tweet length
H = number of distinct hashtags

Algorithm Walkthrough

Optimal SQL Algorithm

Start with the Tweets table and process every tweet individually.
Extract hashtags from each tweet text. Since a tweet may contain multiple hashtags, we need a mechanism that repeatedly finds hashtags until none remain.
For every extracted hashtag, create a separate row. This normalization step is important because aggregation works best when each hashtag occurrence occupies its own record.
Group rows by hashtag. This allows us to count how many times each hashtag appeared across all tweets.
Sort the grouped results:

Higher counts come first
If counts are equal, lexicographically larger hashtags come first

Return only the first 3 rows.

Why it works

The algorithm works because every hashtag occurrence is extracted exactly once and transformed into its own row. Grouping then correctly aggregates all identical hashtags together. Since the final sorting follows the exact ordering rules from the problem statement, the top 3 returned rows are guaranteed to be correct.

Python Solution

LeetCode database problems are solved using SQL rather than procedural Python code. The following SQL query is the correct submission.

# Write your MySQL query statement below

WITH RECURSIVE hashtag_extract AS (
    SELECT
        tweet_id,
        SUBSTRING_INDEX(
            SUBSTRING(tweet, LOCATE('#', tweet)),
            ' ',
            1
        ) AS hashtag,
        SUBSTRING(
            tweet,
            LOCATE('#', tweet) + 1
        ) AS remaining_text
    FROM Tweets
    WHERE tweet LIKE '%#%'

    UNION ALL

    SELECT
        tweet_id,
        SUBSTRING_INDEX(
            SUBSTRING(remaining_text, LOCATE('#', remaining_text)),
            ' ',
            1
        ) AS hashtag,
        SUBSTRING(
            remaining_text,
            LOCATE('#', remaining_text) + 1
        ) AS remaining_text
    FROM hashtag_extract
    WHERE remaining_text LIKE '%#%'
)

SELECT
    hashtag,
    COUNT(*) AS count
FROM hashtag_extract
GROUP BY hashtag
ORDER BY count DESC, hashtag DESC
LIMIT 3;

Implementation Explanation

The solution uses a recursive common table expression named hashtag_extract.

The base query processes every tweet containing at least one hashtag. It locates the first # symbol and extracts the hashtag until the next space.

The recursive section continues searching through the remaining portion of the tweet. Each recursive iteration extracts one additional hashtag.

Eventually, all hashtags from all tweets are converted into individual rows.

After extraction is complete:

GROUP BY hashtag aggregates identical hashtags
COUNT(*) computes frequency
ORDER BY count DESC, hashtag DESC applies the required sorting
LIMIT 3 returns only the top three hashtags

Go Solution

LeetCode database problems do not require Go implementations because submissions are written directly in SQL. However, the following Go code demonstrates how the same logic could be implemented procedurally.

package main

import (
	"fmt"
	"sort"
	"strings"
)

type Tweet struct {
	UserID   int
	TweetID  int
	Tweet    string
	TweetDate string
}

type Result struct {
	Hashtag string
	Count   int
}

func topTrendingHashtags(tweets []Tweet) []Result {
	hashtagCount := make(map[string]int)

	for _, tweet := range tweets {
		words := strings.Split(tweet.Tweet, " ")

		for _, word := range words {
			if len(word) > 0 && word[0] == '#' {
				hashtagCount[word]++
			}
		}
	}

	results := make([]Result, 0)

	for hashtag, count := range hashtagCount {
		results = append(results, Result{
			Hashtag: hashtag,
			Count:   count,
		})
	}

	sort.Slice(results, func(i, j int) bool {
		if results[i].Count == results[j].Count {
			return results[i].Hashtag > results[j].Hashtag
		}
		return results[i].Count > results[j].Count
	})

	if len(results) > 3 {
		results = results[:3]
	}

	return results
}

func main() {
	tweets := []Tweet{
		{135, 13, "Enjoying a great start to the day. #HappyDay #MorningVibes", "2024-02-01"},
		{136, 14, "Another #HappyDay with good vibes! #FeelGood", "2024-02-03"},
		{137, 15, "Productivity peaks! #WorkLife #ProductiveDay", "2024-02-04"},
	}

	results := topTrendingHashtags(tweets)

	for _, r := range results {
		fmt.Println(r.Hashtag, r.Count)
	}
}

Go Implementation Notes

The Go implementation uses:

A hash map for frequency counting
strings.Split to tokenize tweet text
sort.Slice for custom sorting

Unlike SQL, Go requires explicit parsing and aggregation logic. The SQL solution delegates these operations to the database engine.

Worked Examples

Example 1

Input:

tweet_id	tweet
13	`Enjoying a great start to the day. #HappyDay #MorningVibes`
14	`Another #HappyDay with good vibes! #FeelGood`
15	`Productivity peaks! #WorkLife #ProductiveDay`
16	`Exploring new tech frontiers. #TechLife #Innovation`
17	`Gratitude for today's moments. #HappyDay #Thankful`
18	`Innovation drives us. #TechLife #FutureTech`
19	`Connecting with nature's serenity. #Nature #Peaceful`

Step 1: Extract hashtags

tweet_id	Extracted hashtags
13	`#HappyDay`, `#MorningVibes`
14	`#HappyDay`, `#FeelGood`
15	`#WorkLife`, `#ProductiveDay`
16	`#TechLife`, `#Innovation`
17	`#HappyDay`, `#Thankful`
18	`#TechLife`, `#FutureTech`
19	`#Nature`, `#Peaceful`

Step 2: Count frequencies

Hashtag	Count
`#HappyDay`	3
`#TechLife`	2
`#WorkLife`	1
`#MorningVibes`	1
`#FeelGood`	1
`#ProductiveDay`	1
`#Innovation`	1
`#Thankful`	1
`#FutureTech`	1
`#Nature`	1
`#Peaceful`	1

Step 3: Sort results

Sorted by:

Count descending
Hashtag descending

Final top 3:

hashtag	count
`#HappyDay`	3
`#TechLife`	2
`#WorkLife`	1

Complexity Analysis

Measure	Complexity	Explanation
Time	O(T × L)	Each tweet and character is processed at most once
Space	O(H)	Stores counts for distinct hashtags

The dominant cost comes from scanning tweet text to extract hashtags. Aggregation operations are linear relative to the number of extracted hashtags.

The recursive SQL expansion still processes each hashtag occurrence exactly once, making the solution efficient for the problem constraints.

Test Cases

from collections import Counter

def solve(tweets):
    counts = Counter()

    for tweet in tweets:
        for word in tweet.split():
            if word.startswith("#"):
                counts[word] += 1

    return sorted(
        counts.items(),
        key=lambda x: (-x[1], -ord(x[0][1]))
    )[:3]

# Provided example
tweets1 = [
    "Enjoying a great start to the day. #HappyDay #MorningVibes",
    "Another #HappyDay with good vibes! #FeelGood",
    "Productivity peaks! #WorkLife #ProductiveDay",
    "Exploring new tech frontiers. #TechLife #Innovation",
    "Gratitude for today's moments. #HappyDay #Thankful",
    "Innovation drives us. #TechLife #FutureTech",
    "Connecting with nature's serenity. #Nature #Peaceful"
]

assert solve(tweets1)[0][0] == "#HappyDay"  # Main example

# Single hashtag repeated many times
tweets2 = [
    "#A",
    "#A",
    "#A"
]

assert solve(tweets2) == [("#A", 3)]  # Repeated hashtag

# Multiple hashtags in one tweet
tweets3 = [
    "#A #B #C"
]

result3 = solve(tweets3)
assert len(result3) == 3  # Multiple hashtags extracted

# Tie-breaking by hashtag descending
tweets4 = [
    "#Apple",
    "#Banana"
]

result4 = solve(tweets4)
assert result4[0][0] == "#Banana"  # Descending lexicographic order

# Tweets without hashtags
tweets5 = [
    "hello world",
    "good morning"
]

assert solve(tweets5) == []  # No hashtags

# More than three unique hashtags
tweets6 = [
    "#A #B #C #D #E"
]

assert len(solve(tweets6)) == 3  # Only top 3 returned

Test Case Summary

Test	Why
Provided example	Validates standard functionality
Single repeated hashtag	Ensures counting works correctly
Multiple hashtags in one tweet	Verifies extraction logic
Tie-breaking case	Validates secondary sorting rule
Tweets without hashtags	Ensures empty handling
More than three hashtags	Confirms top-3 truncation

Edge Cases

Tweets Without Any Hashtags

Some tweets may contain plain text without a single hashtag. A naive implementation might incorrectly attempt extraction and produce invalid rows.

The SQL solution prevents this by filtering with:

WHERE tweet LIKE '%#%'

Only tweets containing hashtags enter the recursive extraction pipeline.

Multiple Hashtags in a Single Tweet

A single tweet may contain many hashtags. An incorrect solution might extract only the first hashtag and ignore the rest.

The recursive common table expression solves this by repeatedly processing the remaining portion of the tweet until no # symbol remains.

Equal Frequency Hashtags

Several hashtags may share the same count. The problem explicitly requires sorting by hashtag descending in this situation.

The implementation correctly handles this with:

ORDER BY count DESC, hashtag DESC

This guarantees deterministic ordering even when frequencies match.