LeetCode 3150 - Invalid Tweets II
The problem gives us a database table named Tweets with two columns: | Column | Description | | --- | --- | | tweetid | Unique identifier for each tweet | | content | The text content of the tweet | We need to identify all tweets that are considered invalid.
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
The problem gives us a database table named Tweets with two columns:
| Column | Description |
|---|---|
tweet_id |
Unique identifier for each tweet |
content |
The text content of the tweet |
We need to identify all tweets that are considered invalid. A tweet becomes invalid if at least one of the following conditions is true:
- The tweet content length is greater than
140characters. - The tweet contains more than
3mentions. - The tweet contains more than
3hashtags.
A mention is represented by a word beginning with the @ symbol, such as @John.
A hashtag is represented by a word beginning with the # symbol, such as #Coding.
The output should contain only the tweet_id values of invalid tweets, sorted in ascending order.
The key part of this problem is accurately counting:
- Total characters in the tweet
- Number of mentions
- Number of hashtags
Since this is a database problem, the intended solution is an SQL query rather than procedural code. We must use SQL string functions and pattern matching techniques to compute these counts efficiently.
The input size is not explicitly stated, but database interview problems generally expect set based operations rather than row by row procedural logic. This means we should avoid unnecessarily complicated parsing strategies.
There are several important edge cases:
- Tweets exactly
140characters long are valid. - Tweets with exactly
3mentions are valid. - Tweets with exactly
3hashtags are valid. - A tweet may violate multiple conditions simultaneously.
- Tweets may contain no mentions or no hashtags.
- Mentions and hashtags can appear anywhere in the string.
- Multiple spaces or punctuation should not affect counting if we count symbol occurrences directly.
A naive implementation might incorrectly parse mentions using word splitting or regular expressions that miss edge cases. Fortunately, the problem only asks for counts of @ and # symbols associated with mentions and hashtags, so counting occurrences of these characters is sufficient.
Approaches
Brute Force Approach
A brute force solution would process every tweet character by character. For each tweet, we would:
- Count the total number of characters.
- Scan the string one character at a time.
- Increment a mention counter whenever we encounter
@. - Increment a hashtag counter whenever we encounter
#. - Mark the tweet invalid if any condition exceeds the allowed limit.
This approach is straightforward and guaranteed to work because it explicitly inspects every character in the tweet.
However, in SQL, manually iterating through characters is cumbersome and inefficient. Procedural loops are not ideal for relational databases because SQL engines are optimized for declarative set based operations.
Optimal Approach
The better solution uses SQL string functions directly.
We can:
- Use
CHAR_LENGTH(content)to compute the tweet length. - Count mentions by comparing the original string length with the string length after removing all
@characters. - Count hashtags similarly using
#.
For example:
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '@', ''))
This expression equals the number of @ symbols in the string because each removed @ decreases the string length by one.
The same logic applies to hashtags.
This approach is concise, efficient, and fully leverages built in SQL functionality.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N × L) | O(1) | Scans every character manually |
| Optimal | O(N × L) | O(1) | Uses optimized SQL string functions |
Here:
Nis the number of tweetsLis the average tweet length
Algorithm Walkthrough
Optimal SQL Algorithm
- Start by selecting rows from the
Tweetstable.
We need to evaluate every tweet independently because each tweet may or may not violate the rules. 2. Compute the total tweet length.
Use:
CHAR_LENGTH(content)
This gives the number of characters in the tweet. 3. Count mentions.
Use:
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '@', ''))
The REPLACE function removes all @ symbols. The difference in length tells us how many mentions exist.
4. Count hashtags.
Use:
CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '#', ''))
This works exactly like the mention counting logic. 5. Filter invalid tweets.
A tweet is invalid if:
- length > 140
- mentions > 3
- hashtags > 3
We combine these conditions with OR.
6. Sort results.
Use:
ORDER BY tweet_id
to satisfy the output requirement.
Why it works
The algorithm works because every mention contains an @ symbol and every hashtag contains a # symbol. Removing those symbols and comparing string lengths gives the exact count of occurrences. Since the problem defines invalidity purely through these counts and total character length, checking these three conditions completely determines whether a tweet is invalid.
Python Solution
Although this is a database problem whose official answer is SQL, the following Python implementation demonstrates the same logic programmatically.
from typing import List
class Solution:
def invalidTweets(self, tweets: List[List[str]]) -> List[int]:
invalid_ids = []
for tweet_id, content in tweets:
content_length = len(content)
mention_count = content.count('@')
hashtag_count = content.count('#')
if (
content_length > 140
or mention_count > 3
or hashtag_count > 3
):
invalid_ids.append(tweet_id)
return sorted(invalid_ids)
The implementation iterates through every tweet and computes three values:
- The total content length
- The number of
@characters - The number of
#characters
Python's built in count() method efficiently counts occurrences of a character inside a string. After computing these values, we check whether any invalid condition is satisfied.
If the tweet violates at least one rule, its tweet_id is added to the result list. Finally, the list is sorted before returning.
Go Solution
package main
import (
"sort"
"strings"
)
type Solution struct{}
func (s Solution) invalidTweets(tweets [][]string) []int {
var invalidIDs []int
for _, tweet := range tweets {
tweetID := 0
content := tweet[1]
// Convert tweet ID from string to int manually
for _, ch := range tweet[0] {
tweetID = tweetID*10 + int(ch-'0')
}
contentLength := len(content)
mentionCount := strings.Count(content, "@")
hashtagCount := strings.Count(content, "#")
if contentLength > 140 ||
mentionCount > 3 ||
hashtagCount > 3 {
invalidIDs = append(invalidIDs, tweetID)
}
}
sort.Ints(invalidIDs)
return invalidIDs
}
The Go implementation follows the same overall logic as the Python version. The primary difference is that Go uses the strings.Count() function for counting character occurrences.
Go strings are byte based, so len(content) measures bytes rather than Unicode characters. For standard ASCII tweets, this matches the intended behavior. If Unicode handling were required, we would need to count runes instead.
Another difference is explicit slice handling. Go uses slices for dynamic arrays, while Python uses lists.
Worked Examples
Example 1
Input:
tweet_id = 1
content =
"Traveling, exploring, and living my best life
@JaneSmith @SaraJohnson @LisaTaylor @MikeBrown
#Foodie #Fitness #Learning"
We evaluate the tweet step by step.
| Step | Value |
|---|---|
| Content length | Less than 140 |
| Mention count | 4 |
| Hashtag count | 3 |
Since the mention count exceeds 3, the tweet is invalid.
Result:
1 is included
Example 2
Input:
tweet_id = 2
content =
"Just had the best dinner with friends!
#Foodie #Friends #Fun"
| Step | Value |
|---|---|
| Content length | Less than 140 |
| Mention count | 0 |
| Hashtag count | 3 |
All constraints are satisfied.
Result:
2 is NOT included
Example 3
Input:
tweet_id = 4
content =
"Working hard on my new project
#Work #Goals #Productivity #Fun"
| Step | Value |
|---|---|
| Content length | Less than 140 |
| Mention count | 0 |
| Hashtag count | 4 |
The hashtag count exceeds 3, so the tweet is invalid.
Result:
4 is included
Final output:
[1, 4]
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N × L) | Each tweet string is scanned several times |
| Space | O(1) | Only constant extra variables are used |
The algorithm processes each tweet independently. Operations such as count() or REPLACE() require scanning the tweet content, which takes linear time relative to the tweet length.
Since we do not allocate auxiliary data structures proportional to the input size, the extra space usage remains constant.
Test Cases
from typing import List
class Solution:
def invalidTweets(self, tweets: List[List[str]]) -> List[int]:
invalid_ids = []
for tweet_id, content in tweets:
if (
len(content) > 140
or content.count('@') > 3
or content.count('#') > 3
):
invalid_ids.append(tweet_id)
return sorted(invalid_ids)
solution = Solution()
# Example case from problem statement
assert solution.invalidTweets([
[1, "Hello @a @b @c @d #x #y #z"],
[2, "Dinner with friends #Fun #Food #Life"],
[4, "#a #b #c #d"]
]) == [1, 4]
# Exactly 140 characters, should be valid
assert solution.invalidTweets([
[1, "a" * 140]
]) == []
# More than 140 characters, invalid
assert solution.invalidTweets([
[1, "a" * 141]
]) == [1]
# Exactly 3 mentions, valid
assert solution.invalidTweets([
[1, "@a @b @c"]
]) == []
# More than 3 mentions, invalid
assert solution.invalidTweets([
[1, "@a @b @c @d"]
]) == [1]
# Exactly 3 hashtags, valid
assert solution.invalidTweets([
[1, "#a #b #c"]
]) == []
# More than 3 hashtags, invalid
assert solution.invalidTweets([
[1, "#a #b #c #d"]
]) == [1]
# Multiple violations simultaneously
assert solution.invalidTweets([
[1, "@a @b @c @d #a #b #c #d " + ("x" * 150)]
]) == [1]
# Empty tweet content
assert solution.invalidTweets([
[1, ""]
]) == []
# Multiple tweets mixed together
assert solution.invalidTweets([
[1, "@a @b @c @d"],
[2, "#a #b"],
[3, "a" * 141],
[4, "valid tweet"]
]) == [1, 3]
| Test | Why |
|---|---|
| Problem example | Verifies standard functionality |
| Exactly 140 characters | Confirms boundary condition |
| 141 characters | Confirms length overflow detection |
| Exactly 3 mentions | Ensures limit is inclusive |
| 4 mentions | Verifies mention overflow |
| Exactly 3 hashtags | Ensures hashtag boundary correctness |
| 4 hashtags | Verifies hashtag overflow |
| Multiple violations | Ensures OR condition works correctly |
| Empty tweet | Confirms empty input handling |
| Mixed dataset | Verifies multiple row processing |
Edge Cases
Tweets Exactly at the Limit
A very common source of bugs is using >= instead of >. The problem states that tweets are invalid only when they exceed the limit. Therefore:
- Length
140is valid 3mentions are valid3hashtags are valid
The implementation correctly uses strict greater than comparisons.
Tweets Violating Multiple Conditions
A tweet may simultaneously:
- Exceed
140characters - Contain too many mentions
- Contain too many hashtags
A buggy implementation might stop checking after the first condition or accidentally require all conditions to be true. Our implementation combines conditions using logical OR, ensuring that any single violation marks the tweet invalid.
Empty or Minimal Tweets
Tweets may contain empty strings or very short content with no mentions or hashtags. Some implementations incorrectly assume non empty content or fail when counting symbols in empty strings.
Our solution handles this naturally because:
len("")returns0count('@')returns0count('#')returns0
Thus empty tweets are treated as valid unless constraints are violated, which they are not.
Consecutive Symbols
A tweet like:
"@@@ ###"
still contains multiple @ and # symbols. Since the problem only asks for counting mentions and hashtags based on symbol occurrences, directly counting symbols works correctly even in unusual formatting situations.