LeetCode 3436 - Find Valid Emails
The problem requires us to identify valid email addresses from a Users table in a database. Each row contains a userid and an email string. The definition of a valid email is very specific: it must contain exactly one @ symbol, end with .
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
The problem requires us to identify valid email addresses from a Users table in a database. Each row contains a user_id and an email string. The definition of a valid email is very specific: it must contain exactly one @ symbol, end with .com, have a local part (before the @) composed solely of alphanumeric characters or underscores, and have a domain name (between @ and .com) composed only of letters. The output is expected to be all valid emails, ordered by user_id in ascending order.
In other words, the task is to filter rows from the Users table based on a set of stringent pattern rules. The input size is moderate (typical database tables), and SQL pattern matching or regular expressions are well-suited here. Important edge cases include emails with multiple @ symbols, invalid domain characters, missing .com suffix, or special characters in the local part. The problem guarantees user_id is unique, so ordering by it is straightforward.
Approaches
A brute-force approach would be to manually parse each email, splitting on @, checking the counts, validating the local and domain parts using string functions. This is correct but cumbersome in SQL and may require multiple nested queries or user-defined functions.
The optimal approach leverages SQL pattern matching with regular expressions. SQL supports REGEXP (or REGEXP_LIKE in some dialects) to validate emails in a single step. The insight is that all rules can be encoded in a single regex: the local part ^[a-zA-Z0-9_]+, followed by @, followed by the domain [a-zA-Z]+, ending with .com$. This produces a concise, readable, and efficient query.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) | O(1) | Parse each email manually, multiple checks per string |
| Optimal | O(n) | O(1) | Single SQL query using regex; evaluated once per row |
Algorithm Walkthrough
- Select from Users: Start with the
Userstable because it contains all the data. - Filter with Regex: Apply a regular expression filter to the
emailcolumn to ensure it matches all required criteria:
- Local part: one or more alphanumeric characters or underscores
- Exactly one
@symbol - Domain: one or more letters
- Ends with
.com
- Order the results: Since
user_idis unique, ordering ascending guarantees deterministic output.
Why it works: The regex ensures each condition is met simultaneously. SQL evaluates the regex for each row, so only emails that satisfy all constraints remain. Ordering by user_id provides consistent results.
Python Solution
For Python, the problem is solved via SQL query execution using a Python database connector (e.g., SQLite, MySQL). Here is a LeetCode-style submission that outputs the SQL solution:
# Python solution for LeetCode 3436
class Solution:
def findValidEmails(self) -> str:
return """
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
ORDER BY user_id ASC;
"""
Explanation: The REGEXP pattern checks all four conditions. ^[a-zA-Z0-9_]+ ensures the local part is alphanumeric with underscores, @[a-zA-Z]+ validates the domain, and \\.com$ ensures the email ends with .com. The ORDER BY user_id ASC guarantees correct output order.
Go Solution
In Go, we would prepare a similar SQL query string:
package main
func findValidEmails() string {
return `
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
ORDER BY user_id ASC;
`
}
Explanation: Function returns the SQL query as a string. In Go, the main difference is the string literal handling; otherwise, the regex logic is identical to the Python solution.
Worked Examples
Input Table:
This is a SQL database problem. We are given a table Users containing a unique user_id and an email address for each user. The task is to return only those rows whose email addresses satisfy a precise validity specification.
A valid email address must satisfy all of the following conditions simultaneously:
- It contains exactly one
@character. - It ends with the suffix
.com. - The local part, the substring before
@, contains only alphanumeric characters (A-Z,a-z,0-9) and underscores (_). - The domain part, the substring between
@and.com, contains only letters (A-Z,a-z).
The output must contain the original user_id and email columns for every valid email, sorted by user_id in ascending order.
Since this is a database problem, the goal is not to manually parse strings character by character. Instead, the natural solution is to use SQL pattern matching. The key observation is that the validity rules correspond exactly to a regular expression.
Several edge cases must be handled correctly:
- Emails containing no
@symbol must be rejected. - Emails containing multiple
@symbols must be rejected. - Domains ending with extensions other than
.commust be rejected. - Digits or underscores in the domain name must be rejected.
- Special characters such as
-,.,+, or#in the local part must be rejected. - The entire email string must match the required pattern, not merely contain a valid substring.
Approaches
Brute Force Approach
A brute-force solution would parse every email string manually. For each email, we would count the number of @ symbols, verify that the suffix is .com, split the string into components, and then check every character in both the local and domain parts against the allowed character sets.
This approach is correct because it directly implements the problem's definition. Every validity rule is checked explicitly.
However, database systems already provide efficient regular expression engines that are designed precisely for this type of validation. Writing manual parsing logic is unnecessarily verbose and less maintainable.
Optimal Approach
The key observation is that the validity requirements define a regular language. Therefore, a single regular expression can capture all rules simultaneously.
The required pattern is:
^[A-Za-z0-9_]+@[A-Za-z]+\.com$
This expression enforces:
^beginning of string[A-Za-z0-9_]+one or more alphanumeric characters or underscores before@@exactly one separator[A-Za-z]+one or more letters in the domain name\.comliteral.comsuffix$end of string
Any email matching this pattern is valid, and any email violating a rule is rejected.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n·m) | O(1) | Manually validate each character of every email |
| Optimal | O(n·m) | O(1) | Single regular expression match per email |
Here, n is the number of rows and m is the maximum email length.
Algorithm Walkthrough
- Examine each row in the
Userstable. - Apply a regular expression that exactly encodes the validity requirements:
^[A-Za-z0-9_]+@[A-Za-z]+\.com$
- The beginning anchor
^ensures matching starts at the first character of the email. - The pattern
[A-Za-z0-9_]+validates the local part and guarantees that only letters, digits, and underscores appear before the@. - The literal
@ensures exactly one separator exists between the local and domain parts. - The pattern
[A-Za-z]+validates the domain name and guarantees that it consists only of letters. - The pattern
\.comrequires the email to end with the literal suffix.com. - The ending anchor
$ensures no extra characters appear after.com. - Return all matching rows ordered by
user_idascending.
Why it works
The regular expression is constructed directly from the formal specification. Every valid email satisfies each component of the pattern, so it is selected. Conversely, any email violating at least one rule necessarily fails one component of the pattern and is rejected. Therefore the query returns exactly the set of valid email addresses.
Python Solution
For this database problem, LeetCode expects an SQL query rather than a Python implementation.
# This problem is a SQL Database problem.
# No Python solution is required on LeetCode.
The problem belongs to the Database category, so submissions are written in SQL. The actual solution consists of filtering rows using a regular expression and returning the matching rows in ascending user_id order.
Go Solution
Similarly, LeetCode does not expect a Go submission for Database problems.
// This problem is a SQL Database problem.
// No Go solution is required on LeetCode.
Database problems are executed directly against a SQL engine, so language-specific implementations such as Go or Python are not used.
SQL Solution
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[A-Za-z0-9_]+@[A-Za-z]+\\.com$'
ORDER BY user_id;
The query scans the Users table and applies a regular expression filter. Only rows whose email values match the entire pattern are retained. The final ORDER BY user_id clause guarantees ascending order as required.
Worked Examples
Consider the input:
| user_id | |
|---|---|
| 1 | [email protected] |
| 2 | bob_at_example.com |
| 3 | [email protected] |
| 4 | [email protected] |
| 5 | eve@invalid |
Step-by-step Evaluation:
| user_id | Regex Match | Valid? | |
|---|---|---|---|
| 1 | [email protected] | Yes | Yes |
| 2 | bob_at_example.com | No | No |
| 3 | [email protected] | No | No |
| 4 | [email protected] | Yes | Yes |
| 5 | eve@invalid | No | No |
Final output is only user_ids 1 and 4, ordered ascending. The algorithm evaluates each email independently.
| user_id | Matches Regex? | Reason | |
|---|---|---|---|
| 1 | [email protected] | Yes | Valid local part, valid domain, ends with .com |
| 2 | bob_at_example.com | No | Missing @ symbol |
| 3 | [email protected] | No | Does not end with .com |
| 4 | [email protected] | Yes | Satisfies all requirements |
| 5 | eve@invalid | No | Missing .com suffix |
The resulting rows are:
| user_id | |
|---|---|
| 1 | [email protected] |
| 4 | [email protected] |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) | Each row is evaluated once against the regex |
| Space | O(1) | No extra space beyond the output rows |
The time complexity is linear in the number of rows because the regex is evaluated for each row individually. Space complexity is constant with respect to input because filtering occurs in-place in the SQL engine.
Test Cases
# Basic valid/invalid checks
assert Solution().findValidEmails() == """
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
ORDER BY user_id ASC;
""" # Checks the exact SQL query output
# Edge cases would be handled in the database engine with the same query:
# multiple '@', invalid chars in local or domain parts, missing '.com'
| Time | O(n·m) | Each email may be scanned once by the regex engine |
| Space | O(1) | Only constant extra space is required |
The query processes each of the `n` rows independently. Matching a regular expression against an email of length at most `m` requires linear time in the length of that string. No auxiliary data structures proportional to the input size are created.
## Test Cases
```python
import re
pattern = r'^[A-Za-z0-9_]+@[A-Za-z]+\.com$'
assert bool(re.match(pattern, "[email protected]")) is True # valid example
assert bool(re.match(pattern, "[email protected]")) is True # another valid example
assert bool(re.match(pattern, "bob_at_example.com")) is False # missing @
assert bool(re.match(pattern, "[email protected]")) is False # wrong extension
assert bool(re.match(pattern, "eve@invalid")) is False # no .com suffix
assert bool(re.match(pattern, "[email protected]")) is True # digits in local part
assert bool(re.match(pattern, "[email protected]")) is True # underscore in local part
assert bool(re.match(pattern, "[email protected]")) is False # hyphen not allowed
assert bool(re.match(pattern, "[email protected]")) is False # plus not allowed
assert bool(re.match(pattern, "[email protected]")) is False # digits in domain
assert bool(re.match(pattern, "user@test_name.com")) is False # underscore in domain
assert bool(re.match(pattern, "user@@test.com")) is False # multiple @ symbols
assert bool(re.match(pattern, "@test.com")) is False # empty local part
assert bool(re.match(pattern, "[email protected]")) is False # empty domain
assert bool(re.match(pattern, "[email protected]")) is True # uppercase letters
| Test | Why |
|---|---|
| Standard table with mixed valid/invalid | Validates all regex rules |
Emails missing .com |
Tests .com enforcement |
Emails with multiple @ |
Ensures only single @ is accepted |
| Emails with invalid characters | Validates character restrictions |
Edge Cases
Case 1: Multiple @ symbols
Emails like john@@example.com should be rejected. The regex ensures exactly one @ by only allowing one occurrence in the pattern.
Case 2: Invalid domain characters
Emails such as [email protected] include numbers in the domain. The [a-zA-Z]+ segment disallows digits, correctly rejecting these emails.
Case 3: Missing .com suffix
Emails like [email protected] do not end with .com. The regex enforces the .com$ ending, filtering these out automatically.
This implementation handles all such edge cases concisely, leveraging the power of SQL regex filtering.
| [email protected] | Basic valid email |
| [email protected] | Another valid email |
| bob_at_example.com | Missing @ |
| [email protected] | Invalid extension |
| eve@invalid | Missing .com |
| [email protected] | Digits allowed before @ |
| [email protected] | Underscore allowed before @ |
| [email protected] | Hyphen not allowed before @ |
| [email protected] | Plus sign not allowed |
| [email protected] | Digits not allowed in domain |
| user@test_name.com | Underscore not allowed in domain |
| user@@test.com | Multiple @ symbols |
| @test.com | Empty local part |
| [email protected] | Empty domain |
| [email protected] | Uppercase letters are valid |
Edge Cases
Multiple @ Symbols
An email such as user@@domain.com might accidentally pass simplistic validation that merely checks for the presence of @. The regular expression prevents this because exactly one @ must separate the two required components. The extra @ causes the match to fail.
Invalid Characters in the Domain
An address like [email protected] or user@domain_name.com contains characters that are not letters in the domain portion. The pattern [A-Za-z]+ allows only alphabetic characters, so such addresses are correctly rejected.
Invalid Characters in the Local Part
Many real-world email systems permit characters such as +, -, or . in the local part. However, this problem explicitly restricts the local part to alphanumeric characters and underscores. Therefore emails such as [email protected] and [email protected] must be rejected. The pattern [A-Za-z0-9_]+ enforces exactly that requirement.
Missing .com Suffix
Addresses such as [email protected], [email protected], or user@domain violate the requirement that every valid email end with .com. The literal suffix \.com$ ensures that only emails with the required ending are accepted.
Empty Local Part or Domain
Inputs such as @domain.com or [email protected] contain empty components. Because both [A-Za-z0-9_]+ and [A-Za-z]+ require at least one character, these malformed addresses are rejected automatically.