LeetCode 1517 - Find Users With Valid E-Mails
This problem asks us to query a database table named Users and return only the rows corresponding to users whose e-mail
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
This problem asks us to query a database table named Users and return only the rows corresponding to users whose e-mail addresses are considered valid according to a very specific format.
Each row in the Users table contains three columns:
user_id, a unique integer identifier for each username, the user's namemail, the user's e-mail address
The task is to filter the table and return only those users whose mail field satisfies the validity rules.
A valid e-mail consists of two parts:
- A prefix name, which appears before the
@symbol. - A domain, which must be exactly
@leetcode.com.
The prefix name has strict requirements:
-
It must start with a letter, either uppercase or lowercase.
-
After the first character, it may contain:
-
letters (
a-z,A-Z) -
digits (
0-9) -
underscore (
_) -
period (
.) -
dash (
-)
The domain requirement is equally strict:
- The e-mail must end with exactly
@leetcode.com.
This means addresses like [email protected] are invalid because the domain is wrong, and addresses like [email protected] are invalid because the prefix does not start with a letter.
The expected output is simply the filtered table containing all rows whose e-mails match these rules. The order does not matter.
Since this is a database problem, the most natural approach is to use pattern matching with regular expressions, because the validation rule is essentially a string pattern specification.
Important Edge Cases
There are several cases that can easily cause mistakes in a naive implementation.
An e-mail might be missing the domain entirely, such as jonathanisgreat. A simple substring check would fail to validate the full structure.
The prefix might contain invalid symbols such as #, %, or spaces. For example, quarz#[email protected] is invalid because # is not an allowed character.
The prefix might start with an invalid character. For example, [email protected] is invalid even though periods are allowed later in the prefix.
The domain must match exactly. An address like [email protected] or [email protected] must not be accepted.
The problem guarantees that user_id is unique, so duplicate row handling is unnecessary.
Approaches
Brute Force Approach
A brute-force solution would manually inspect every e-mail character by character.
For each row, we could:
- Verify the string contains exactly one
@. - Split the e-mail into prefix and domain.
- Check that the domain equals
leetcode.com. - Verify the prefix starts with a letter.
- Iterate through every remaining character to ensure each belongs to the allowed set.
This approach is correct because it explicitly checks every rule one by one. However, it becomes verbose and error-prone. Implementing string parsing logic manually increases complexity and makes the query harder to read and maintain.
In SQL, this is especially awkward because character-by-character validation is not the strength of relational databases.
Optimal Approach
The key observation is that the entire validation rule can be expressed naturally as a regular expression.
We want a pattern that enforces:
- Start of string
- First character is a letter
- Remaining prefix characters are letters, digits,
_,., or- - Exact domain
@leetcode.com - End of string
The corresponding regex is:
^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$
Let us break it down:
^means start of string.[A-Za-z]ensures the first character is a letter.[A-Za-z0-9_.-]*allows zero or more valid characters afterward.@leetcode\.comenforces the exact domain.$means end of string.
We can then use SQL's regex matching capability with REGEXP.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N × M) | O(1) | Manually validates every character of every e-mail |
| Optimal | O(N × M) | O(1) | Uses regex matching, cleaner and more maintainable |
Here:
Nis the number of rows inUsersMis the average e-mail length
Although both approaches have similar theoretical complexity, the regex solution is significantly simpler and more practical for SQL.
Algorithm Walkthrough
Optimal Algorithm
- Read each row from the
Userstable.
We need to examine every user's e-mail because validity depends entirely on the contents of the mail column.
2. Apply a regular expression match to the mail column.
The regex pattern is:
^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$
This ensures the prefix starts with a letter, contains only valid characters afterward, and ends with the exact required domain. 3. Keep only matching rows.
If the e-mail satisfies the regex, the row is included in the result set. Otherwise, it is discarded. 4. Return the filtered table.
The problem allows results in any order, so no sorting is necessary.
Why it works
The correctness comes from the fact that the regex encodes exactly the rules described in the problem statement.
The first character constraint is enforced by [A-Za-z], the allowed character set is enforced by [A-Za-z0-9_.-]*, and the required domain is enforced by @leetcode\.com. The anchors ^ and $ guarantee the entire string must match the pattern, preventing partial matches or extra characters.
Therefore, a row is included if and only if its e-mail is valid.
Python Solution
Although this is a database problem and LeetCode expects SQL, below is a Python equivalent implementation to demonstrate the logic clearly.
import re
from typing import List
class Solution:
def findValidEmails(self, users: List[dict]) -> List[dict]:
pattern = re.compile(
r'^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$'
)
valid_users = []
for user in users:
email = user["mail"]
if pattern.match(email):
valid_users.append(user)
return valid_users
The implementation begins by compiling the regular expression pattern. This improves efficiency because the regex engine does not need to re-parse the pattern for every e-mail.
We then iterate through every user record and extract the mail field. If the e-mail matches the regex pattern, the user is added to the result list.
Finally, the filtered list is returned.
For the actual LeetCode submission, the SQL query is:
SELECT *
FROM Users
WHERE mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\\.com$';
This query directly applies the regex condition to filter valid e-mails.
Go Solution
Below is a Go implementation that mirrors the Python logic.
package main
import "regexp"
func findValidEmails(users []map[string]string) []map[string]string {
pattern := regexp.MustCompile(
`^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$`,
)
validUsers := []map[string]string{}
for _, user := range users {
email := user["mail"]
if pattern.MatchString(email) {
validUsers = append(validUsers, user)
}
}
return validUsers
}
The Go version uses the regexp package and compiles the regex once using regexp.MustCompile.
Unlike Python, Go uses slices instead of lists. We initialize an empty slice validUsers and append matching users as we iterate.
There are no integer overflow concerns here because we are only processing strings. Similarly, nil handling is straightforward because an empty result simply returns an empty slice.
For the actual LeetCode database submission, use SQL:
SELECT *
FROM Users
WHERE mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\\.com$';
Worked Examples
Example 1
Input table:
| user_id | name | |
|---|---|---|
| 1 | Winston | [email protected] |
| 2 | Jonathan | jonathanisgreat |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
| 5 | Marwan | quarz#[email protected] |
| 6 | David | [email protected] |
| 7 | Shapiro | [email protected] |
We process each row one at a time.
| User | Regex Match | Result | |
|---|---|---|---|
| Winston | [email protected] |
Yes | Keep |
| Jonathan | jonathanisgreat |
No, missing domain | Remove |
| Annabelle | [email protected] |
Yes | Keep |
| Sally | [email protected] |
Yes | Keep |
| Marwan | quarz#[email protected] |
No, invalid # |
Remove |
| David | [email protected] |
No, wrong domain | Remove |
| Shapiro | [email protected] |
No, starts with . |
Remove |
Final output:
| user_id | name | |
|---|---|---|
| 1 | Winston | [email protected] |
| 3 | Annabelle | [email protected] |
| 4 | Sally | [email protected] |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N × M) | Each e-mail is scanned once during regex matching |
| Space | O(1) | Only constant extra memory is used |
The algorithm processes every row once, and regex matching examines characters in the e-mail string. If M is the average e-mail length and N is the number of users, the total runtime is O(N × M).
The additional memory usage is constant because we only store a regex pattern and a few temporary variables.
Test Cases
import re
def is_valid(email):
pattern = re.compile(
r'^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$'
)
return bool(pattern.match(email))
assert is_valid("[email protected]") is True # valid example
assert is_valid("jonathanisgreat") is False # missing domain
assert is_valid("[email protected]") is True # dash allowed
assert is_valid("[email protected]") is True # period allowed
assert is_valid("quarz#[email protected]") is False # invalid character
assert is_valid("[email protected]") is False # wrong domain
assert is_valid("[email protected]") is False # starts with period
assert is_valid("[email protected]") is True # single letter prefix
assert is_valid("[email protected]") is True # underscore and digits
assert is_valid("[email protected]") is False # starts with digit
assert is_valid("[email protected]") is False # starts with dash
assert is_valid("[email protected]") is False # incorrect domain
assert is_valid("abc@@leetcode.com") is False # malformed email
assert is_valid("abc [email protected]") is False # spaces invalid
Test Summary
| Test | Why |
|---|---|
[email protected] |
Valid standard case |
jonathanisgreat |
Missing domain |
[email protected] |
Dash allowed in prefix |
[email protected] |
Period allowed in prefix |
quarz#[email protected] |
Invalid symbol rejection |
[email protected] |
Wrong domain rejection |
[email protected] |
Prefix must start with letter |
[email protected] |
Minimum valid prefix |
[email protected] |
Mixed valid characters |
[email protected] |
Cannot start with digit |
[email protected] |
Cannot start with dash |
[email protected] |
Exact domain required |
abc@@leetcode.com |
Invalid formatting |
abc [email protected] |
Spaces are not allowed |
Edge Cases
One important edge case is a prefix that begins with an invalid character. Since periods, underscores, and dashes are allowed later in the prefix, it is easy to accidentally permit them at the start. For example, [email protected] should be invalid. Our regex handles this by explicitly requiring the first character to match [A-Za-z].
Another important edge case is an invalid domain. A naive implementation might only check whether the string contains "leetcode.com" somewhere. This would incorrectly allow values such as [email protected]. Our implementation enforces the exact suffix @leetcode\.com and anchors the pattern to the end of the string.
A third edge case involves illegal special characters inside the prefix. Characters such as #, %, or spaces are not allowed. Since the regex explicitly restricts valid characters to [A-Za-z0-9_.-], any unexpected symbol automatically causes the match to fail.
Finally, malformed e-mails with multiple @ symbols or missing domains are correctly rejected because the entire string must match the required format from beginning to end.