LeetCode 3059 - Find All Unique Email Domains

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze email addresses stored in a database table and determine how many people belong to each unique email domain, but only for domains ending in .com.

The Emails table contains two columns:

Column	Meaning
`id`	Unique identifier for each row
`email`	Full email address

An email address has the general structure:

username@domain

For example:

[email protected]

Here:

adcmaf is the username
outlook.com is the domain

The task is to:

Extract the domain portion of every email address.
Keep only domains that end with .com.
Group emails by domain.
Count how many rows belong to each domain.
Return the result ordered alphabetically by domain name.

The output should contain:

Column	Meaning
`email_domain`	The extracted domain
`count`	Number of individuals using that domain

The problem guarantees that:

Emails contain only lowercase letters.
Each row has a valid email structure.
id is unique.

Since this is a database problem, the expected solution is written in SQL. The main operations involved are:

String extraction
Filtering
Grouping
Aggregation
Sorting

An important detail is that we only count domains ending with .com. Domains like:

test.edu
example.org
school.net

must be ignored.

Another subtle point is that multiple users may share the same domain. We count all matching rows, not just unique domains.

Edge cases that could cause issues include:

No .com domains at all
Every email belonging to the same domain
Multiple different .com domains
Domains with similar prefixes, such as mail.com and gmail.com
Emails containing multiple periods in the domain

A correct solution must reliably extract everything after the @ symbol and apply the .com filter accurately.

Approaches

Brute Force Approach

The brute force approach would process every email manually and perform repeated scans to count occurrences of domains.

Conceptually, the algorithm would:

Iterate through every email.
Extract the domain.
Check whether the domain ends with .com.
For each valid domain, scan the entire dataset again to count how many times it appears.
Store results while avoiding duplicate outputs.

This approach is correct because every domain count is computed explicitly through repeated comparisons. However, it is inefficient because counting each domain independently causes unnecessary repeated work.

If there are n emails, and each valid domain requires another scan of the table, the total complexity can become quadratic.

Database systems are designed to avoid this inefficiency through aggregation operations like GROUP BY.

Optimal Approach

The key observation is that SQL databases already provide highly optimized grouping and counting operations.

Instead of repeatedly scanning the dataset, we can:

Extract domains once using string functions.
Filter only .com domains.
Group rows by domain.
Use COUNT(*) to compute totals efficiently.

The main SQL techniques used are:

SUBSTRING_INDEX(email, '@', -1) to extract the domain
LIKE '%.com' to filter .com domains
GROUP BY to aggregate identical domains
ORDER BY to sort results alphabetically

This approach processes each row only once before aggregation, making it much more efficient and cleaner.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(n)	Repeated scans for counting domains
Optimal	O(n log n)	O(n)	Uses SQL grouping and sorting efficiently

The sorting step contributes the log n factor in most database implementations.

Algorithm Walkthrough

Read every row from the Emails table.

Each row contains a full email address. We need to isolate the domain portion after the @ symbol. 2. Extract the domain from each email.

We use:

SUBSTRING_INDEX(email, '@', -1)

This returns everything after the final @.

Example:

[email protected] -> outlook.com

Filter domains ending with .com.

We only keep domains matching:

LIKE '%.com'

This removes domains such as:

test.edu
example.org

Group rows by extracted domain.

Emails sharing the same domain should belong to the same group.

Example:

outlook.com:
- [email protected]
- [email protected]

Count rows inside each group.

We use:

COUNT(*)

This gives the number of individuals associated with the domain. 6. Rename output columns appropriately.

The problem requires:

email_domain
count

Sort the result alphabetically.

We use:

ORDER BY email_domain

so the output appears in ascending lexicographical order.

Why it works

The algorithm works because every email belongs to exactly one domain, and the extraction step deterministically isolates that domain. Filtering ensures only .com domains remain. Grouping combines identical domains together, and counting the rows in each group correctly computes how many individuals use that domain.

Since every valid email is processed exactly once and every matching domain is grouped consistently, the final counts are correct.

Python Solution

Although LeetCode database problems are normally solved in SQL, the following Python implementation demonstrates the same logic programmatically.

from collections import defaultdict
from typing import List, Dict

class Solution:
    def findUniqueEmailDomains(self, emails: List[str]) -> List[Dict[str, int]]:
        domain_count = defaultdict(int)

        for email in emails:
            domain = email.split("@")[1]

            if domain.endswith(".com"):
                domain_count[domain] += 1

        result = []

        for domain in sorted(domain_count.keys()):
            result.append({
                "email_domain": domain,
                "count": domain_count[domain]
            })

        return result

The implementation begins by creating a hash map called domain_count. This dictionary stores how many times each valid domain appears.

For every email address, the code splits the string at the @ symbol and extracts the domain portion. The endswith(".com") check ensures only .com domains are counted.

Whenever a valid domain is found, its count is incremented inside the hash map.

After processing all emails, the domains are sorted alphabetically to satisfy the required output ordering. The final result is constructed as a list of dictionaries containing the required column names.

This implementation mirrors the SQL aggregation process closely:

Domain extraction corresponds to SUBSTRING_INDEX
Filtering corresponds to WHERE
Hash map counting corresponds to GROUP BY + COUNT(*)
Sorting corresponds to ORDER BY

Go Solution

package main

import (
	"sort"
	"strings"
)

type Result struct {
	EmailDomain string
	Count       int
}

func findUniqueEmailDomains(emails []string) []Result {
	domainCount := make(map[string]int)

	for _, email := range emails {
		parts := strings.Split(email, "@")
		domain := parts[1]

		if strings.HasSuffix(domain, ".com") {
			domainCount[domain]++
		}
	}

	domains := make([]string, 0, len(domainCount))

	for domain := range domainCount {
		domains = append(domains, domain)
	}

	sort.Strings(domains)

	result := make([]Result, 0, len(domains))

	for _, domain := range domains {
		result = append(result, Result{
			EmailDomain: domain,
			Count:       domainCount[domain],
		})
	}

	return result
}

The Go implementation follows the same overall algorithm as the Python version, but uses Go-specific data structures and utilities.

A map[string]int stores domain frequencies efficiently. The strings.Split function extracts the domain, while strings.HasSuffix checks whether the domain ends with .com.

Since Go maps are unordered, we first collect all domains into a slice and then sort them using sort.Strings.

Unlike Python dictionaries, Go requires a dedicated struct type for returning structured results.

Worked Examples

Example 1

Input:

[
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]

Step-by-step Trace

Email	Extracted Domain	Ends With .com	Domain Count State
[email protected]	test.edu	No	{}
[email protected]	outlook.com	Yes	{outlook.com: 1}
[email protected]	yahoo.com	Yes	{outlook.com: 1, yahoo.com: 1}
[email protected]	test.edu	No	{outlook.com: 1, yahoo.com: 1}
[email protected]	example.org	No	{outlook.com: 1, yahoo.com: 1}
[email protected]	outlook.com	Yes	{outlook.com: 2, yahoo.com: 1}

After sorting domains alphabetically:

email_domain	count
outlook.com	2
yahoo.com	1

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n log n)	Processing emails is O(n), sorting domains adds O(k log k)
Space	O(n)	Hash map stores domain frequencies

Here, n is the number of emails and k is the number of unique .com domains.

The algorithm scans each email exactly once, making the counting step linear. The additional sorting step depends on the number of unique domains.

The space complexity comes from storing domain counts in the hash map.

Test Cases

from collections import defaultdict

class Solution:
    def findUniqueEmailDomains(self, emails):
        domain_count = defaultdict(int)

        for email in emails:
            domain = email.split("@")[1]

            if domain.endswith(".com"):
                domain_count[domain] += 1

        result = []

        for domain in sorted(domain_count.keys()):
            result.append({
                "email_domain": domain,
                "count": domain_count[domain]
            })

        return result

sol = Solution()

# Basic example from problem statement
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "outlook.com", "count": 2},
    {"email_domain": "yahoo.com", "count": 1}
]

# No .com domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]"
]) == []

# Single .com domain repeated many times
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "gmail.com", "count": 3}
]

# Multiple distinct .com domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "gmail.com", "count": 1},
    {"email_domain": "hotmail.com", "count": 1},
    {"email_domain": "yahoo.com", "count": 1}
]

# Domains with multiple periods
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "mail.google.com", "count": 2}
]

# Mixed valid and invalid domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "test.com", "count": 2}
]

Test	Why
Problem statement example	Verifies standard behavior
No `.com` domains	Ensures filtering works correctly
Repeated single domain	Validates aggregation
Multiple unique domains	Verifies grouping and sorting
Multiple periods in domain	Ensures extraction logic remains correct
Mixed valid and invalid domains	Tests filtering and counting together

Edge Cases

No Valid `.com` Domains

An input may contain only domains like .edu or .org. A buggy implementation might still include them accidentally if the filtering condition is incorrect.

The implementation handles this safely by explicitly checking:

domain.endswith(".com")

Only matching domains are counted, so the result becomes an empty list.

Many users may belong to the same email provider. A naive implementation could mistakenly overwrite counts instead of incrementing them.

The hash map approach avoids this issue by incrementing existing counts:

domain_count[domain] += 1

This guarantees accurate aggregation.

Domains Containing Multiple Periods

Some domains contain additional subdomains, such as:

mail.google.com

A fragile parser might incorrectly extract only google.com.

The implementation avoids this problem by splitting only at the @ symbol and keeping everything afterward unchanged. This preserves the full domain exactly as required.

LeetCode 3059 - Find All Unique Email Domains

Solution

Problem Understanding

Approaches

Brute Force Approach

Optimal Approach

Algorithm Walkthrough

Why it works

Python Solution

Go Solution

Worked Examples

Example 1

Step-by-step Trace

Complexity Analysis

Test Cases

Edge Cases

No Valid .com Domains

Multiple Emails Sharing the Same Domain

Domains Containing Multiple Periods

No Valid `.com` Domains