LeetCode 3059 - Find All Unique Email Domains
This problem asks us to analyze email addresses stored in a database table and determine how many people belong to each unique email domain, but only for domains ending in .com.
Difficulty: 🟢 Easy
Topics: Database
Solution
Problem Understanding
This problem asks us to analyze email addresses stored in a database table and determine how many people belong to each unique email domain, but only for domains ending in .com.
The Emails table contains two columns:
| Column | Meaning |
|---|---|
id |
Unique identifier for each row |
email |
Full email address |
An email address has the general structure:
username@domain
For example:
[email protected]
Here:
adcmafis the usernameoutlook.comis the domain
The task is to:
- Extract the domain portion of every email address.
- Keep only domains that end with
.com. - Group emails by domain.
- Count how many rows belong to each domain.
- Return the result ordered alphabetically by domain name.
The output should contain:
| Column | Meaning |
|---|---|
email_domain |
The extracted domain |
count |
Number of individuals using that domain |
The problem guarantees that:
- Emails contain only lowercase letters.
- Each row has a valid email structure.
idis unique.
Since this is a database problem, the expected solution is written in SQL. The main operations involved are:
- String extraction
- Filtering
- Grouping
- Aggregation
- Sorting
An important detail is that we only count domains ending with .com. Domains like:
test.edu
example.org
school.net
must be ignored.
Another subtle point is that multiple users may share the same domain. We count all matching rows, not just unique domains.
Edge cases that could cause issues include:
- No
.comdomains at all - Every email belonging to the same domain
- Multiple different
.comdomains - Domains with similar prefixes, such as
mail.comandgmail.com - Emails containing multiple periods in the domain
A correct solution must reliably extract everything after the @ symbol and apply the .com filter accurately.
Approaches
Brute Force Approach
The brute force approach would process every email manually and perform repeated scans to count occurrences of domains.
Conceptually, the algorithm would:
- Iterate through every email.
- Extract the domain.
- Check whether the domain ends with
.com. - For each valid domain, scan the entire dataset again to count how many times it appears.
- Store results while avoiding duplicate outputs.
This approach is correct because every domain count is computed explicitly through repeated comparisons. However, it is inefficient because counting each domain independently causes unnecessary repeated work.
If there are n emails, and each valid domain requires another scan of the table, the total complexity can become quadratic.
Database systems are designed to avoid this inefficiency through aggregation operations like GROUP BY.
Optimal Approach
The key observation is that SQL databases already provide highly optimized grouping and counting operations.
Instead of repeatedly scanning the dataset, we can:
- Extract domains once using string functions.
- Filter only
.comdomains. - Group rows by domain.
- Use
COUNT(*)to compute totals efficiently.
The main SQL techniques used are:
SUBSTRING_INDEX(email, '@', -1)to extract the domainLIKE '%.com'to filter.comdomainsGROUP BYto aggregate identical domainsORDER BYto sort results alphabetically
This approach processes each row only once before aggregation, making it much more efficient and cleaner.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Repeated scans for counting domains |
| Optimal | O(n log n) | O(n) | Uses SQL grouping and sorting efficiently |
The sorting step contributes the log n factor in most database implementations.
Algorithm Walkthrough
- Read every row from the
Emailstable.
Each row contains a full email address. We need to isolate the domain portion after the @ symbol.
2. Extract the domain from each email.
We use:
SUBSTRING_INDEX(email, '@', -1)
This returns everything after the final @.
Example:
[email protected] -> outlook.com
- Filter domains ending with
.com.
We only keep domains matching:
LIKE '%.com'
This removes domains such as:
test.edu
example.org
- Group rows by extracted domain.
Emails sharing the same domain should belong to the same group.
Example:
outlook.com:
- [email protected]
- [email protected]
- Count rows inside each group.
We use:
COUNT(*)
This gives the number of individuals associated with the domain. 6. Rename output columns appropriately.
The problem requires:
email_domaincount
- Sort the result alphabetically.
We use:
ORDER BY email_domain
so the output appears in ascending lexicographical order.
Why it works
The algorithm works because every email belongs to exactly one domain, and the extraction step deterministically isolates that domain. Filtering ensures only .com domains remain. Grouping combines identical domains together, and counting the rows in each group correctly computes how many individuals use that domain.
Since every valid email is processed exactly once and every matching domain is grouped consistently, the final counts are correct.
Python Solution
Although LeetCode database problems are normally solved in SQL, the following Python implementation demonstrates the same logic programmatically.
from collections import defaultdict
from typing import List, Dict
class Solution:
def findUniqueEmailDomains(self, emails: List[str]) -> List[Dict[str, int]]:
domain_count = defaultdict(int)
for email in emails:
domain = email.split("@")[1]
if domain.endswith(".com"):
domain_count[domain] += 1
result = []
for domain in sorted(domain_count.keys()):
result.append({
"email_domain": domain,
"count": domain_count[domain]
})
return result
The implementation begins by creating a hash map called domain_count. This dictionary stores how many times each valid domain appears.
For every email address, the code splits the string at the @ symbol and extracts the domain portion. The endswith(".com") check ensures only .com domains are counted.
Whenever a valid domain is found, its count is incremented inside the hash map.
After processing all emails, the domains are sorted alphabetically to satisfy the required output ordering. The final result is constructed as a list of dictionaries containing the required column names.
This implementation mirrors the SQL aggregation process closely:
- Domain extraction corresponds to
SUBSTRING_INDEX - Filtering corresponds to
WHERE - Hash map counting corresponds to
GROUP BY + COUNT(*) - Sorting corresponds to
ORDER BY
Go Solution
package main
import (
"sort"
"strings"
)
type Result struct {
EmailDomain string
Count int
}
func findUniqueEmailDomains(emails []string) []Result {
domainCount := make(map[string]int)
for _, email := range emails {
parts := strings.Split(email, "@")
domain := parts[1]
if strings.HasSuffix(domain, ".com") {
domainCount[domain]++
}
}
domains := make([]string, 0, len(domainCount))
for domain := range domainCount {
domains = append(domains, domain)
}
sort.Strings(domains)
result := make([]Result, 0, len(domains))
for _, domain := range domains {
result = append(result, Result{
EmailDomain: domain,
Count: domainCount[domain],
})
}
return result
}
The Go implementation follows the same overall algorithm as the Python version, but uses Go-specific data structures and utilities.
A map[string]int stores domain frequencies efficiently. The strings.Split function extracts the domain, while strings.HasSuffix checks whether the domain ends with .com.
Since Go maps are unordered, we first collect all domains into a slice and then sort them using sort.Strings.
Unlike Python dictionaries, Go requires a dedicated struct type for returning structured results.
Worked Examples
Example 1
Input:
[
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
]
Step-by-step Trace
| Extracted Domain | Ends With .com | Domain Count State | |
|---|---|---|---|
| [email protected] | test.edu | No | {} |
| [email protected] | outlook.com | Yes | {outlook.com: 1} |
| [email protected] | yahoo.com | Yes | {outlook.com: 1, yahoo.com: 1} |
| [email protected] | test.edu | No | {outlook.com: 1, yahoo.com: 1} |
| [email protected] | example.org | No | {outlook.com: 1, yahoo.com: 1} |
| [email protected] | outlook.com | Yes | {outlook.com: 2, yahoo.com: 1} |
After sorting domains alphabetically:
| email_domain | count |
|---|---|
| outlook.com | 2 |
| yahoo.com | 1 |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n) | Processing emails is O(n), sorting domains adds O(k log k) |
| Space | O(n) | Hash map stores domain frequencies |
Here, n is the number of emails and k is the number of unique .com domains.
The algorithm scans each email exactly once, making the counting step linear. The additional sorting step depends on the number of unique domains.
The space complexity comes from storing domain counts in the hash map.
Test Cases
from collections import defaultdict
class Solution:
def findUniqueEmailDomains(self, emails):
domain_count = defaultdict(int)
for email in emails:
domain = email.split("@")[1]
if domain.endswith(".com"):
domain_count[domain] += 1
result = []
for domain in sorted(domain_count.keys()):
result.append({
"email_domain": domain,
"count": domain_count[domain]
})
return result
sol = Solution()
# Basic example from problem statement
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
]) == [
{"email_domain": "outlook.com", "count": 2},
{"email_domain": "yahoo.com", "count": 1}
]
# No .com domains
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]"
]) == []
# Single .com domain repeated many times
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]",
"[email protected]"
]) == [
{"email_domain": "gmail.com", "count": 3}
]
# Multiple distinct .com domains
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]",
"[email protected]"
]) == [
{"email_domain": "gmail.com", "count": 1},
{"email_domain": "hotmail.com", "count": 1},
{"email_domain": "yahoo.com", "count": 1}
]
# Domains with multiple periods
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]"
]) == [
{"email_domain": "mail.google.com", "count": 2}
]
# Mixed valid and invalid domains
assert sol.findUniqueEmailDomains([
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
]) == [
{"email_domain": "test.com", "count": 2}
]
| Test | Why |
|---|---|
| Problem statement example | Verifies standard behavior |
No .com domains |
Ensures filtering works correctly |
| Repeated single domain | Validates aggregation |
| Multiple unique domains | Verifies grouping and sorting |
| Multiple periods in domain | Ensures extraction logic remains correct |
| Mixed valid and invalid domains | Tests filtering and counting together |
Edge Cases
No Valid .com Domains
An input may contain only domains like .edu or .org. A buggy implementation might still include them accidentally if the filtering condition is incorrect.
The implementation handles this safely by explicitly checking:
domain.endswith(".com")
Only matching domains are counted, so the result becomes an empty list.
Multiple Emails Sharing the Same Domain
Many users may belong to the same email provider. A naive implementation could mistakenly overwrite counts instead of incrementing them.
The hash map approach avoids this issue by incrementing existing counts:
domain_count[domain] += 1
This guarantees accurate aggregation.
Domains Containing Multiple Periods
Some domains contain additional subdomains, such as:
mail.google.com
A fragile parser might incorrectly extract only google.com.
The implementation avoids this problem by splitting only at the @ symbol and keeping everything afterward unchanged. This preserves the full domain exactly as required.