LeetCode 1517 - Find Users With Valid E-Mails

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to query a database table named Users and return only the rows corresponding to users whose e-mail addresses are considered valid according to a very specific format.

Each row in the Users table contains three columns:

user_id, a unique integer identifier for each user
name, the user's name
mail, the user's e-mail address

The task is to filter the table and return only those users whose mail field satisfies the validity rules.

A valid e-mail consists of two parts:

A prefix name, which appears before the @ symbol.
A domain, which must be exactly @leetcode.com.

The prefix name has strict requirements:

It must start with a letter, either uppercase or lowercase.
After the first character, it may contain:
letters (a-z, A-Z)
digits (0-9)
underscore (_)
period (.)
dash (-)

The domain requirement is equally strict:

The e-mail must end with exactly @leetcode.com.

This means addresses like [email protected] are invalid because the domain is wrong, and addresses like [email protected] are invalid because the prefix does not start with a letter.

The expected output is simply the filtered table containing all rows whose e-mails match these rules. The order does not matter.

Since this is a database problem, the most natural approach is to use pattern matching with regular expressions, because the validation rule is essentially a string pattern specification.

Important Edge Cases

There are several cases that can easily cause mistakes in a naive implementation.

An e-mail might be missing the domain entirely, such as jonathanisgreat. A simple substring check would fail to validate the full structure.

The prefix might contain invalid symbols such as #, %, or spaces. For example, quarz#[email protected] is invalid because # is not an allowed character.

The prefix might start with an invalid character. For example, [email protected] is invalid even though periods are allowed later in the prefix.

The domain must match exactly. An address like [email protected] or [email protected] must not be accepted.

The problem guarantees that user_id is unique, so duplicate row handling is unnecessary.

Approaches

Brute Force Approach

A brute-force solution would manually inspect every e-mail character by character.

For each row, we could:

Verify the string contains exactly one @.
Split the e-mail into prefix and domain.
Check that the domain equals leetcode.com.
Verify the prefix starts with a letter.
Iterate through every remaining character to ensure each belongs to the allowed set.

This approach is correct because it explicitly checks every rule one by one. However, it becomes verbose and error-prone. Implementing string parsing logic manually increases complexity and makes the query harder to read and maintain.

In SQL, this is especially awkward because character-by-character validation is not the strength of relational databases.

Optimal Approach

The key observation is that the entire validation rule can be expressed naturally as a regular expression.

We want a pattern that enforces:

Start of string
First character is a letter
Remaining prefix characters are letters, digits, _, ., or -
Exact domain @leetcode.com
End of string

The corresponding regex is:

^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$

Let us break it down:

^ means start of string.
[A-Za-z] ensures the first character is a letter.
[A-Za-z0-9_.-]* allows zero or more valid characters afterward.
@leetcode\.com enforces the exact domain.
$ means end of string.

We can then use SQL's regex matching capability with REGEXP.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(N × M)	O(1)	Manually validates every character of every e-mail
Optimal	O(N × M)	O(1)	Uses regex matching, cleaner and more maintainable

Here:

N is the number of rows in Users
M is the average e-mail length

Although both approaches have similar theoretical complexity, the regex solution is significantly simpler and more practical for SQL.

Algorithm Walkthrough

Optimal Algorithm

Read each row from the Users table.

We need to examine every user's e-mail because validity depends entirely on the contents of the mail column. 2. Apply a regular expression match to the mail column.

The regex pattern is:

^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$

This ensures the prefix starts with a letter, contains only valid characters afterward, and ends with the exact required domain. 3. Keep only matching rows.

If the e-mail satisfies the regex, the row is included in the result set. Otherwise, it is discarded. 4. Return the filtered table.

The problem allows results in any order, so no sorting is necessary.

Why it works

The correctness comes from the fact that the regex encodes exactly the rules described in the problem statement.

The first character constraint is enforced by [A-Za-z], the allowed character set is enforced by [A-Za-z0-9_.-]*, and the required domain is enforced by @leetcode\.com. The anchors ^ and $ guarantee the entire string must match the pattern, preventing partial matches or extra characters.

Therefore, a row is included if and only if its e-mail is valid.

Python Solution

Although this is a database problem and LeetCode expects SQL, below is a Python equivalent implementation to demonstrate the logic clearly.

import re
from typing import List

class Solution:
    def findValidEmails(self, users: List[dict]) -> List[dict]:
        pattern = re.compile(
            r'^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$'
        )

        valid_users = []

        for user in users:
            email = user["mail"]

            if pattern.match(email):
                valid_users.append(user)

        return valid_users

The implementation begins by compiling the regular expression pattern. This improves efficiency because the regex engine does not need to re-parse the pattern for every e-mail.

We then iterate through every user record and extract the mail field. If the e-mail matches the regex pattern, the user is added to the result list.

Finally, the filtered list is returned.

For the actual LeetCode submission, the SQL query is:

SELECT *
FROM Users
WHERE mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\\.com$';

This query directly applies the regex condition to filter valid e-mails.

Go Solution

Below is a Go implementation that mirrors the Python logic.

package main

import "regexp"

func findValidEmails(users []map[string]string) []map[string]string {
	pattern := regexp.MustCompile(
		`^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$`,
	)

	validUsers := []map[string]string{}

	for _, user := range users {
		email := user["mail"]

		if pattern.MatchString(email) {
			validUsers = append(validUsers, user)
		}
	}

	return validUsers
}

The Go version uses the regexp package and compiles the regex once using regexp.MustCompile.

Unlike Python, Go uses slices instead of lists. We initialize an empty slice validUsers and append matching users as we iterate.

There are no integer overflow concerns here because we are only processing strings. Similarly, nil handling is straightforward because an empty result simply returns an empty slice.

For the actual LeetCode database submission, use SQL:

SELECT *
FROM Users
WHERE mail REGEXP '^[A-Za-z][A-Za-z0-9_.-]*@leetcode\\.com$';

Worked Examples

Example 1

Input table:

user_id	name	mail
1	Winston	[email protected]
2	Jonathan	jonathanisgreat
3	Annabelle	[email protected]
4	Sally	[email protected]
5	Marwan	quarz#[email protected]
6	David	[email protected]
7	Shapiro	[email protected]

We process each row one at a time.

User	Email	Regex Match	Result
Winston	`[email protected]`	Yes	Keep
Jonathan	`jonathanisgreat`	No, missing domain	Remove
Annabelle	`[email protected]`	Yes	Keep
Sally	`[email protected]`	Yes	Keep
Marwan	`quarz#[email protected]`	No, invalid `#`	Remove
David	`[email protected]`	No, wrong domain	Remove
Shapiro	`[email protected]`	No, starts with `.`	Remove

Final output:

user_id	name	mail
1	Winston	[email protected]
3	Annabelle	[email protected]
4	Sally	[email protected]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(N × M)	Each e-mail is scanned once during regex matching
Space	O(1)	Only constant extra memory is used

The algorithm processes every row once, and regex matching examines characters in the e-mail string. If M is the average e-mail length and N is the number of users, the total runtime is O(N × M).

The additional memory usage is constant because we only store a regex pattern and a few temporary variables.

Test Cases

import re

def is_valid(email):
    pattern = re.compile(
        r'^[A-Za-z][A-Za-z0-9_.-]*@leetcode\.com$'
    )
    return bool(pattern.match(email))

assert is_valid("[email protected]") is True  # valid example
assert is_valid("jonathanisgreat") is False  # missing domain
assert is_valid("[email protected]") is True  # dash allowed
assert is_valid("[email protected]") is True  # period allowed
assert is_valid("quarz#[email protected]") is False  # invalid character
assert is_valid("[email protected]") is False  # wrong domain
assert is_valid("[email protected]") is False  # starts with period
assert is_valid("[email protected]") is True  # single letter prefix
assert is_valid("[email protected]") is True  # underscore and digits
assert is_valid("[email protected]") is False  # starts with digit
assert is_valid("[email protected]") is False  # starts with dash
assert is_valid("[email protected]") is False  # incorrect domain
assert is_valid("abc@@leetcode.com") is False  # malformed email
assert is_valid("abc [email protected]") is False  # spaces invalid

Test Summary

Test	Why
`[email protected]`	Valid standard case
`jonathanisgreat`	Missing domain
`[email protected]`	Dash allowed in prefix
`[email protected]`	Period allowed in prefix
`quarz#[email protected]`	Invalid symbol rejection
`[email protected]`	Wrong domain rejection
`[email protected]`	Prefix must start with letter
`[email protected]`	Minimum valid prefix
`[email protected]`	Mixed valid characters
`[email protected]`	Cannot start with digit
`[email protected]`	Cannot start with dash
`[email protected]`	Exact domain required
`abc@@leetcode.com`	Invalid formatting
`abc [email protected]`	Spaces are not allowed

Edge Cases

One important edge case is a prefix that begins with an invalid character. Since periods, underscores, and dashes are allowed later in the prefix, it is easy to accidentally permit them at the start. For example, [email protected] should be invalid. Our regex handles this by explicitly requiring the first character to match [A-Za-z].

Another important edge case is an invalid domain. A naive implementation might only check whether the string contains "leetcode.com" somewhere. This would incorrectly allow values such as [email protected]. Our implementation enforces the exact suffix @leetcode\.com and anchors the pattern to the end of the string.

A third edge case involves illegal special characters inside the prefix. Characters such as #, %, or spaces are not allowed. Since the regex explicitly restricts valid characters to [A-Za-z0-9_.-], any unexpected symbol automatically causes the match to fail.

Finally, malformed e-mails with multiple @ symbols or missing domains are correctly rejected because the entire string must match the required format from beginning to end.