LeetCode 3436 - Find Valid Emails

The problem requires us to identify valid email addresses from a Users table in a database. Each row contains a userid and an email string. The definition of a valid email is very specific: it must contain exactly one @ symbol, end with .

LeetCode Problem 3436

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem requires us to identify valid email addresses from a Users table in a database. Each row contains a user_id and an email string. The definition of a valid email is very specific: it must contain exactly one @ symbol, end with .com, have a local part (before the @) composed solely of alphanumeric characters or underscores, and have a domain name (between @ and .com) composed only of letters. The output is expected to be all valid emails, ordered by user_id in ascending order.

In other words, the task is to filter rows from the Users table based on a set of stringent pattern rules. The input size is moderate (typical database tables), and SQL pattern matching or regular expressions are well-suited here. Important edge cases include emails with multiple @ symbols, invalid domain characters, missing .com suffix, or special characters in the local part. The problem guarantees user_id is unique, so ordering by it is straightforward.

Approaches

A brute-force approach would be to manually parse each email, splitting on @, checking the counts, validating the local and domain parts using string functions. This is correct but cumbersome in SQL and may require multiple nested queries or user-defined functions.

The optimal approach leverages SQL pattern matching with regular expressions. SQL supports REGEXP (or REGEXP_LIKE in some dialects) to validate emails in a single step. The insight is that all rules can be encoded in a single regex: the local part ^[a-zA-Z0-9_]+, followed by @, followed by the domain [a-zA-Z]+, ending with .com$. This produces a concise, readable, and efficient query.

Approach Time Complexity Space Complexity Notes
Brute Force O(n) O(1) Parse each email manually, multiple checks per string
Optimal O(n) O(1) Single SQL query using regex; evaluated once per row

Algorithm Walkthrough

  1. Select from Users: Start with the Users table because it contains all the data.
  2. Filter with Regex: Apply a regular expression filter to the email column to ensure it matches all required criteria:
  • Local part: one or more alphanumeric characters or underscores
  • Exactly one @ symbol
  • Domain: one or more letters
  • Ends with .com
  1. Order the results: Since user_id is unique, ordering ascending guarantees deterministic output.

Why it works: The regex ensures each condition is met simultaneously. SQL evaluates the regex for each row, so only emails that satisfy all constraints remain. Ordering by user_id provides consistent results.

Python Solution

For Python, the problem is solved via SQL query execution using a Python database connector (e.g., SQLite, MySQL). Here is a LeetCode-style submission that outputs the SQL solution:

# Python solution for LeetCode 3436
class Solution:
    def findValidEmails(self) -> str:
        return """
        SELECT user_id, email
        FROM Users
        WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
        ORDER BY user_id ASC;
        """

Explanation: The REGEXP pattern checks all four conditions. ^[a-zA-Z0-9_]+ ensures the local part is alphanumeric with underscores, @[a-zA-Z]+ validates the domain, and \\.com$ ensures the email ends with .com. The ORDER BY user_id ASC guarantees correct output order.

Go Solution

In Go, we would prepare a similar SQL query string:

package main

func findValidEmails() string {
    return `
        SELECT user_id, email
        FROM Users
        WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
        ORDER BY user_id ASC;
    `
}

Explanation: Function returns the SQL query as a string. In Go, the main difference is the string literal handling; otherwise, the regex logic is identical to the Python solution.

Worked Examples

Input Table: This is a SQL database problem. We are given a table Users containing a unique user_id and an email address for each user. The task is to return only those rows whose email addresses satisfy a precise validity specification.

A valid email address must satisfy all of the following conditions simultaneously:

  1. It contains exactly one @ character.
  2. It ends with the suffix .com.
  3. The local part, the substring before @, contains only alphanumeric characters (A-Z, a-z, 0-9) and underscores (_).
  4. The domain part, the substring between @ and .com, contains only letters (A-Z, a-z).

The output must contain the original user_id and email columns for every valid email, sorted by user_id in ascending order.

Since this is a database problem, the goal is not to manually parse strings character by character. Instead, the natural solution is to use SQL pattern matching. The key observation is that the validity rules correspond exactly to a regular expression.

Several edge cases must be handled correctly:

  • Emails containing no @ symbol must be rejected.
  • Emails containing multiple @ symbols must be rejected.
  • Domains ending with extensions other than .com must be rejected.
  • Digits or underscores in the domain name must be rejected.
  • Special characters such as -, ., +, or # in the local part must be rejected.
  • The entire email string must match the required pattern, not merely contain a valid substring.

Approaches

Brute Force Approach

A brute-force solution would parse every email string manually. For each email, we would count the number of @ symbols, verify that the suffix is .com, split the string into components, and then check every character in both the local and domain parts against the allowed character sets.

This approach is correct because it directly implements the problem's definition. Every validity rule is checked explicitly.

However, database systems already provide efficient regular expression engines that are designed precisely for this type of validation. Writing manual parsing logic is unnecessarily verbose and less maintainable.

Optimal Approach

The key observation is that the validity requirements define a regular language. Therefore, a single regular expression can capture all rules simultaneously.

The required pattern is:

^[A-Za-z0-9_]+@[A-Za-z]+\.com$

This expression enforces:

  • ^ beginning of string
  • [A-Za-z0-9_]+ one or more alphanumeric characters or underscores before @
  • @ exactly one separator
  • [A-Za-z]+ one or more letters in the domain name
  • \.com literal .com suffix
  • $ end of string

Any email matching this pattern is valid, and any email violating a rule is rejected.

Approach Time Complexity Space Complexity Notes
Brute Force O(n·m) O(1) Manually validate each character of every email
Optimal O(n·m) O(1) Single regular expression match per email

Here, n is the number of rows and m is the maximum email length.

Algorithm Walkthrough

  1. Examine each row in the Users table.
  2. Apply a regular expression that exactly encodes the validity requirements:
^[A-Za-z0-9_]+@[A-Za-z]+\.com$
  1. The beginning anchor ^ ensures matching starts at the first character of the email.
  2. The pattern [A-Za-z0-9_]+ validates the local part and guarantees that only letters, digits, and underscores appear before the @.
  3. The literal @ ensures exactly one separator exists between the local and domain parts.
  4. The pattern [A-Za-z]+ validates the domain name and guarantees that it consists only of letters.
  5. The pattern \.com requires the email to end with the literal suffix .com.
  6. The ending anchor $ ensures no extra characters appear after .com.
  7. Return all matching rows ordered by user_id ascending.

Why it works

The regular expression is constructed directly from the formal specification. Every valid email satisfies each component of the pattern, so it is selected. Conversely, any email violating at least one rule necessarily fails one component of the pattern and is rejected. Therefore the query returns exactly the set of valid email addresses.

Python Solution

For this database problem, LeetCode expects an SQL query rather than a Python implementation.

# This problem is a SQL Database problem.
# No Python solution is required on LeetCode.

The problem belongs to the Database category, so submissions are written in SQL. The actual solution consists of filtering rows using a regular expression and returning the matching rows in ascending user_id order.

Go Solution

Similarly, LeetCode does not expect a Go submission for Database problems.

// This problem is a SQL Database problem.
// No Go solution is required on LeetCode.

Database problems are executed directly against a SQL engine, so language-specific implementations such as Go or Python are not used.

SQL Solution

SELECT user_id, email
FROM Users
WHERE email REGEXP '^[A-Za-z0-9_]+@[A-Za-z]+\\.com$'
ORDER BY user_id;

The query scans the Users table and applies a regular expression filter. Only rows whose email values match the entire pattern are retained. The final ORDER BY user_id clause guarantees ascending order as required.

Worked Examples

Consider the input:

user_id email
1 [email protected]
2 bob_at_example.com
3 [email protected]
4 [email protected]
5 eve@invalid

Step-by-step Evaluation:

user_id email Regex Match Valid?
1 [email protected] Yes Yes
2 bob_at_example.com No No
3 [email protected] No No
4 [email protected] Yes Yes
5 eve@invalid No No

Final output is only user_ids 1 and 4, ordered ascending. The algorithm evaluates each email independently.

user_id email Matches Regex? Reason
1 [email protected] Yes Valid local part, valid domain, ends with .com
2 bob_at_example.com No Missing @ symbol
3 [email protected] No Does not end with .com
4 [email protected] Yes Satisfies all requirements
5 eve@invalid No Missing .com suffix

The resulting rows are:

user_id email
1 [email protected]
4 [email protected]

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each row is evaluated once against the regex
Space O(1) No extra space beyond the output rows

The time complexity is linear in the number of rows because the regex is evaluated for each row individually. Space complexity is constant with respect to input because filtering occurs in-place in the SQL engine.

Test Cases

# Basic valid/invalid checks
assert Solution().findValidEmails() == """
SELECT user_id, email
FROM Users
WHERE email REGEXP '^[a-zA-Z0-9_]+@[a-zA-Z]+\\.com$'
ORDER BY user_id ASC;
"""  # Checks the exact SQL query output

# Edge cases would be handled in the database engine with the same query:
# multiple '@', invalid chars in local or domain parts, missing '.com'
| Time | O(n·m) | Each email may be scanned once by the regex engine |
| Space | O(1) | Only constant extra space is required |

The query processes each of the `n` rows independently. Matching a regular expression against an email of length at most `m` requires linear time in the length of that string. No auxiliary data structures proportional to the input size are created.

## Test Cases

```python
import re

pattern = r'^[A-Za-z0-9_]+@[A-Za-z]+\.com$'

assert bool(re.match(pattern, "[email protected]")) is True      # valid example
assert bool(re.match(pattern, "[email protected]")) is True       # another valid example

assert bool(re.match(pattern, "bob_at_example.com")) is False    # missing @
assert bool(re.match(pattern, "[email protected]")) is False   # wrong extension
assert bool(re.match(pattern, "eve@invalid")) is False           # no .com suffix

assert bool(re.match(pattern, "[email protected]")) is True       # digits in local part
assert bool(re.match(pattern, "[email protected]")) is True     # underscore in local part

assert bool(re.match(pattern, "[email protected]")) is False    # hyphen not allowed
assert bool(re.match(pattern, "[email protected]")) is False    # plus not allowed

assert bool(re.match(pattern, "[email protected]")) is False      # digits in domain
assert bool(re.match(pattern, "user@test_name.com")) is False    # underscore in domain

assert bool(re.match(pattern, "user@@test.com")) is False        # multiple @ symbols
assert bool(re.match(pattern, "@test.com")) is False             # empty local part
assert bool(re.match(pattern, "[email protected]")) is False             # empty domain

assert bool(re.match(pattern, "[email protected]")) is True     # uppercase letters
Test Why
Standard table with mixed valid/invalid Validates all regex rules
Emails missing .com Tests .com enforcement
Emails with multiple @ Ensures only single @ is accepted
Emails with invalid characters Validates character restrictions

Edge Cases

Case 1: Multiple @ symbols

Emails like john@@example.com should be rejected. The regex ensures exactly one @ by only allowing one occurrence in the pattern.

Case 2: Invalid domain characters

Emails such as [email protected] include numbers in the domain. The [a-zA-Z]+ segment disallows digits, correctly rejecting these emails.

Case 3: Missing .com suffix

Emails like [email protected] do not end with .com. The regex enforces the .com$ ending, filtering these out automatically.

This implementation handles all such edge cases concisely, leveraging the power of SQL regex filtering. | [email protected] | Basic valid email | | [email protected] | Another valid email | | bob_at_example.com | Missing @ | | [email protected] | Invalid extension | | eve@invalid | Missing .com | | [email protected] | Digits allowed before @ | | [email protected] | Underscore allowed before @ | | [email protected] | Hyphen not allowed before @ | | [email protected] | Plus sign not allowed | | [email protected] | Digits not allowed in domain | | user@test_name.com | Underscore not allowed in domain | | user@@test.com | Multiple @ symbols | | @test.com | Empty local part | | [email protected] | Empty domain | | [email protected] | Uppercase letters are valid |

Edge Cases

Multiple @ Symbols

An email such as user@@domain.com might accidentally pass simplistic validation that merely checks for the presence of @. The regular expression prevents this because exactly one @ must separate the two required components. The extra @ causes the match to fail.

Invalid Characters in the Domain

An address like [email protected] or user@domain_name.com contains characters that are not letters in the domain portion. The pattern [A-Za-z]+ allows only alphabetic characters, so such addresses are correctly rejected.

Invalid Characters in the Local Part

Many real-world email systems permit characters such as +, -, or . in the local part. However, this problem explicitly restricts the local part to alphanumeric characters and underscores. Therefore emails such as [email protected] and [email protected] must be rejected. The pattern [A-Za-z0-9_]+ enforces exactly that requirement.

Missing .com Suffix

Addresses such as [email protected], [email protected], or user@domain violate the requirement that every valid email end with .com. The literal suffix \.com$ ensures that only emails with the required ending are accepted.

Empty Local Part or Domain

Inputs such as @domain.com or [email protected] contain empty components. Because both [A-Za-z0-9_]+ and [A-Za-z]+ require at least one character, these malformed addresses are rejected automatically.