LeetCode 3150 - Invalid Tweets II

The problem gives us a database table named Tweets with two columns: | Column | Description | | --- | --- | | tweetid | Unique identifier for each tweet | | content | The text content of the tweet | We need to identify all tweets that are considered invalid.

LeetCode Problem 3150

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem gives us a database table named Tweets with two columns:

Column Description
tweet_id Unique identifier for each tweet
content The text content of the tweet

We need to identify all tweets that are considered invalid. A tweet becomes invalid if at least one of the following conditions is true:

  1. The tweet content length is greater than 140 characters.
  2. The tweet contains more than 3 mentions.
  3. The tweet contains more than 3 hashtags.

A mention is represented by a word beginning with the @ symbol, such as @John.

A hashtag is represented by a word beginning with the # symbol, such as #Coding.

The output should contain only the tweet_id values of invalid tweets, sorted in ascending order.

The key part of this problem is accurately counting:

  • Total characters in the tweet
  • Number of mentions
  • Number of hashtags

Since this is a database problem, the intended solution is an SQL query rather than procedural code. We must use SQL string functions and pattern matching techniques to compute these counts efficiently.

The input size is not explicitly stated, but database interview problems generally expect set based operations rather than row by row procedural logic. This means we should avoid unnecessarily complicated parsing strategies.

There are several important edge cases:

  • Tweets exactly 140 characters long are valid.
  • Tweets with exactly 3 mentions are valid.
  • Tweets with exactly 3 hashtags are valid.
  • A tweet may violate multiple conditions simultaneously.
  • Tweets may contain no mentions or no hashtags.
  • Mentions and hashtags can appear anywhere in the string.
  • Multiple spaces or punctuation should not affect counting if we count symbol occurrences directly.

A naive implementation might incorrectly parse mentions using word splitting or regular expressions that miss edge cases. Fortunately, the problem only asks for counts of @ and # symbols associated with mentions and hashtags, so counting occurrences of these characters is sufficient.

Approaches

Brute Force Approach

A brute force solution would process every tweet character by character. For each tweet, we would:

  1. Count the total number of characters.
  2. Scan the string one character at a time.
  3. Increment a mention counter whenever we encounter @.
  4. Increment a hashtag counter whenever we encounter #.
  5. Mark the tweet invalid if any condition exceeds the allowed limit.

This approach is straightforward and guaranteed to work because it explicitly inspects every character in the tweet.

However, in SQL, manually iterating through characters is cumbersome and inefficient. Procedural loops are not ideal for relational databases because SQL engines are optimized for declarative set based operations.

Optimal Approach

The better solution uses SQL string functions directly.

We can:

  • Use CHAR_LENGTH(content) to compute the tweet length.
  • Count mentions by comparing the original string length with the string length after removing all @ characters.
  • Count hashtags similarly using #.

For example:

CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '@', ''))

This expression equals the number of @ symbols in the string because each removed @ decreases the string length by one.

The same logic applies to hashtags.

This approach is concise, efficient, and fully leverages built in SQL functionality.

Approach Time Complexity Space Complexity Notes
Brute Force O(N × L) O(1) Scans every character manually
Optimal O(N × L) O(1) Uses optimized SQL string functions

Here:

  • N is the number of tweets
  • L is the average tweet length

Algorithm Walkthrough

Optimal SQL Algorithm

  1. Start by selecting rows from the Tweets table.

We need to evaluate every tweet independently because each tweet may or may not violate the rules. 2. Compute the total tweet length.

Use:

CHAR_LENGTH(content)

This gives the number of characters in the tweet. 3. Count mentions.

Use:

CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '@', ''))

The REPLACE function removes all @ symbols. The difference in length tells us how many mentions exist. 4. Count hashtags.

Use:

CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, '#', ''))

This works exactly like the mention counting logic. 5. Filter invalid tweets.

A tweet is invalid if:

  • length > 140
  • mentions > 3
  • hashtags > 3

We combine these conditions with OR. 6. Sort results.

Use:

ORDER BY tweet_id

to satisfy the output requirement.

Why it works

The algorithm works because every mention contains an @ symbol and every hashtag contains a # symbol. Removing those symbols and comparing string lengths gives the exact count of occurrences. Since the problem defines invalidity purely through these counts and total character length, checking these three conditions completely determines whether a tweet is invalid.

Python Solution

Although this is a database problem whose official answer is SQL, the following Python implementation demonstrates the same logic programmatically.

from typing import List

class Solution:
    def invalidTweets(self, tweets: List[List[str]]) -> List[int]:
        invalid_ids = []

        for tweet_id, content in tweets:
            content_length = len(content)
            mention_count = content.count('@')
            hashtag_count = content.count('#')

            if (
                content_length > 140
                or mention_count > 3
                or hashtag_count > 3
            ):
                invalid_ids.append(tweet_id)

        return sorted(invalid_ids)

The implementation iterates through every tweet and computes three values:

  • The total content length
  • The number of @ characters
  • The number of # characters

Python's built in count() method efficiently counts occurrences of a character inside a string. After computing these values, we check whether any invalid condition is satisfied.

If the tweet violates at least one rule, its tweet_id is added to the result list. Finally, the list is sorted before returning.

Go Solution

package main

import (
	"sort"
	"strings"
)

type Solution struct{}

func (s Solution) invalidTweets(tweets [][]string) []int {
	var invalidIDs []int

	for _, tweet := range tweets {
		tweetID := 0
		content := tweet[1]

		// Convert tweet ID from string to int manually
		for _, ch := range tweet[0] {
			tweetID = tweetID*10 + int(ch-'0')
		}

		contentLength := len(content)
		mentionCount := strings.Count(content, "@")
		hashtagCount := strings.Count(content, "#")

		if contentLength > 140 ||
			mentionCount > 3 ||
			hashtagCount > 3 {
			invalidIDs = append(invalidIDs, tweetID)
		}
	}

	sort.Ints(invalidIDs)

	return invalidIDs
}

The Go implementation follows the same overall logic as the Python version. The primary difference is that Go uses the strings.Count() function for counting character occurrences.

Go strings are byte based, so len(content) measures bytes rather than Unicode characters. For standard ASCII tweets, this matches the intended behavior. If Unicode handling were required, we would need to count runes instead.

Another difference is explicit slice handling. Go uses slices for dynamic arrays, while Python uses lists.

Worked Examples

Example 1

Input:

tweet_id = 1
content =
"Traveling, exploring, and living my best life
 @JaneSmith @SaraJohnson @LisaTaylor @MikeBrown
 #Foodie #Fitness #Learning"

We evaluate the tweet step by step.

Step Value
Content length Less than 140
Mention count 4
Hashtag count 3

Since the mention count exceeds 3, the tweet is invalid.

Result:

1 is included

Example 2

Input:

tweet_id = 2
content =
"Just had the best dinner with friends!
 #Foodie #Friends #Fun"
Step Value
Content length Less than 140
Mention count 0
Hashtag count 3

All constraints are satisfied.

Result:

2 is NOT included

Example 3

Input:

tweet_id = 4
content =
"Working hard on my new project
 #Work #Goals #Productivity #Fun"
Step Value
Content length Less than 140
Mention count 0
Hashtag count 4

The hashtag count exceeds 3, so the tweet is invalid.

Result:

4 is included

Final output:

[1, 4]

Complexity Analysis

Measure Complexity Explanation
Time O(N × L) Each tweet string is scanned several times
Space O(1) Only constant extra variables are used

The algorithm processes each tweet independently. Operations such as count() or REPLACE() require scanning the tweet content, which takes linear time relative to the tweet length.

Since we do not allocate auxiliary data structures proportional to the input size, the extra space usage remains constant.

Test Cases

from typing import List

class Solution:
    def invalidTweets(self, tweets: List[List[str]]) -> List[int]:
        invalid_ids = []

        for tweet_id, content in tweets:
            if (
                len(content) > 140
                or content.count('@') > 3
                or content.count('#') > 3
            ):
                invalid_ids.append(tweet_id)

        return sorted(invalid_ids)

solution = Solution()

# Example case from problem statement
assert solution.invalidTweets([
    [1, "Hello @a @b @c @d #x #y #z"],
    [2, "Dinner with friends #Fun #Food #Life"],
    [4, "#a #b #c #d"]
]) == [1, 4]

# Exactly 140 characters, should be valid
assert solution.invalidTweets([
    [1, "a" * 140]
]) == []

# More than 140 characters, invalid
assert solution.invalidTweets([
    [1, "a" * 141]
]) == [1]

# Exactly 3 mentions, valid
assert solution.invalidTweets([
    [1, "@a @b @c"]
]) == []

# More than 3 mentions, invalid
assert solution.invalidTweets([
    [1, "@a @b @c @d"]
]) == [1]

# Exactly 3 hashtags, valid
assert solution.invalidTweets([
    [1, "#a #b #c"]
]) == []

# More than 3 hashtags, invalid
assert solution.invalidTweets([
    [1, "#a #b #c #d"]
]) == [1]

# Multiple violations simultaneously
assert solution.invalidTweets([
    [1, "@a @b @c @d #a #b #c #d " + ("x" * 150)]
]) == [1]

# Empty tweet content
assert solution.invalidTweets([
    [1, ""]
]) == []

# Multiple tweets mixed together
assert solution.invalidTweets([
    [1, "@a @b @c @d"],
    [2, "#a #b"],
    [3, "a" * 141],
    [4, "valid tweet"]
]) == [1, 3]
Test Why
Problem example Verifies standard functionality
Exactly 140 characters Confirms boundary condition
141 characters Confirms length overflow detection
Exactly 3 mentions Ensures limit is inclusive
4 mentions Verifies mention overflow
Exactly 3 hashtags Ensures hashtag boundary correctness
4 hashtags Verifies hashtag overflow
Multiple violations Ensures OR condition works correctly
Empty tweet Confirms empty input handling
Mixed dataset Verifies multiple row processing

Edge Cases

Tweets Exactly at the Limit

A very common source of bugs is using >= instead of >. The problem states that tweets are invalid only when they exceed the limit. Therefore:

  • Length 140 is valid
  • 3 mentions are valid
  • 3 hashtags are valid

The implementation correctly uses strict greater than comparisons.

Tweets Violating Multiple Conditions

A tweet may simultaneously:

  • Exceed 140 characters
  • Contain too many mentions
  • Contain too many hashtags

A buggy implementation might stop checking after the first condition or accidentally require all conditions to be true. Our implementation combines conditions using logical OR, ensuring that any single violation marks the tweet invalid.

Empty or Minimal Tweets

Tweets may contain empty strings or very short content with no mentions or hashtags. Some implementations incorrectly assume non empty content or fail when counting symbols in empty strings.

Our solution handles this naturally because:

  • len("") returns 0
  • count('@') returns 0
  • count('#') returns 0

Thus empty tweets are treated as valid unless constraints are violated, which they are not.

Consecutive Symbols

A tweet like:

"@@@ ###"

still contains multiple @ and # symbols. Since the problem only asks for counting mentions and hashtags based on symbol occurrences, directly counting symbols works correctly even in unusual formatting situations.