LeetCode 3059 - Find All Unique Email Domains

This problem asks us to analyze email addresses stored in a database table and determine how many people belong to each unique email domain, but only for domains ending in .com.

LeetCode Problem 3059

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

This problem asks us to analyze email addresses stored in a database table and determine how many people belong to each unique email domain, but only for domains ending in .com.

The Emails table contains two columns:

Column Meaning
id Unique identifier for each row
email Full email address

An email address has the general structure:

username@domain

For example:

[email protected]

Here:

  • adcmaf is the username
  • outlook.com is the domain

The task is to:

  1. Extract the domain portion of every email address.
  2. Keep only domains that end with .com.
  3. Group emails by domain.
  4. Count how many rows belong to each domain.
  5. Return the result ordered alphabetically by domain name.

The output should contain:

Column Meaning
email_domain The extracted domain
count Number of individuals using that domain

The problem guarantees that:

  • Emails contain only lowercase letters.
  • Each row has a valid email structure.
  • id is unique.

Since this is a database problem, the expected solution is written in SQL. The main operations involved are:

  • String extraction
  • Filtering
  • Grouping
  • Aggregation
  • Sorting

An important detail is that we only count domains ending with .com. Domains like:

test.edu
example.org
school.net

must be ignored.

Another subtle point is that multiple users may share the same domain. We count all matching rows, not just unique domains.

Edge cases that could cause issues include:

  • No .com domains at all
  • Every email belonging to the same domain
  • Multiple different .com domains
  • Domains with similar prefixes, such as mail.com and gmail.com
  • Emails containing multiple periods in the domain

A correct solution must reliably extract everything after the @ symbol and apply the .com filter accurately.

Approaches

Brute Force Approach

The brute force approach would process every email manually and perform repeated scans to count occurrences of domains.

Conceptually, the algorithm would:

  1. Iterate through every email.
  2. Extract the domain.
  3. Check whether the domain ends with .com.
  4. For each valid domain, scan the entire dataset again to count how many times it appears.
  5. Store results while avoiding duplicate outputs.

This approach is correct because every domain count is computed explicitly through repeated comparisons. However, it is inefficient because counting each domain independently causes unnecessary repeated work.

If there are n emails, and each valid domain requires another scan of the table, the total complexity can become quadratic.

Database systems are designed to avoid this inefficiency through aggregation operations like GROUP BY.

Optimal Approach

The key observation is that SQL databases already provide highly optimized grouping and counting operations.

Instead of repeatedly scanning the dataset, we can:

  1. Extract domains once using string functions.
  2. Filter only .com domains.
  3. Group rows by domain.
  4. Use COUNT(*) to compute totals efficiently.

The main SQL techniques used are:

  • SUBSTRING_INDEX(email, '@', -1) to extract the domain
  • LIKE '%.com' to filter .com domains
  • GROUP BY to aggregate identical domains
  • ORDER BY to sort results alphabetically

This approach processes each row only once before aggregation, making it much more efficient and cleaner.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Repeated scans for counting domains
Optimal O(n log n) O(n) Uses SQL grouping and sorting efficiently

The sorting step contributes the log n factor in most database implementations.

Algorithm Walkthrough

  1. Read every row from the Emails table.

Each row contains a full email address. We need to isolate the domain portion after the @ symbol. 2. Extract the domain from each email.

We use:

SUBSTRING_INDEX(email, '@', -1)

This returns everything after the final @.

Example:

[email protected] -> outlook.com
  1. Filter domains ending with .com.

We only keep domains matching:

LIKE '%.com'

This removes domains such as:

test.edu
example.org
  1. Group rows by extracted domain.

Emails sharing the same domain should belong to the same group.

Example:

outlook.com:
- [email protected]
- [email protected]
  1. Count rows inside each group.

We use:

COUNT(*)

This gives the number of individuals associated with the domain. 6. Rename output columns appropriately.

The problem requires:

  • email_domain
  • count
  1. Sort the result alphabetically.

We use:

ORDER BY email_domain

so the output appears in ascending lexicographical order.

Why it works

The algorithm works because every email belongs to exactly one domain, and the extraction step deterministically isolates that domain. Filtering ensures only .com domains remain. Grouping combines identical domains together, and counting the rows in each group correctly computes how many individuals use that domain.

Since every valid email is processed exactly once and every matching domain is grouped consistently, the final counts are correct.

Python Solution

Although LeetCode database problems are normally solved in SQL, the following Python implementation demonstrates the same logic programmatically.

from collections import defaultdict
from typing import List, Dict

class Solution:
    def findUniqueEmailDomains(self, emails: List[str]) -> List[Dict[str, int]]:
        domain_count = defaultdict(int)

        for email in emails:
            domain = email.split("@")[1]

            if domain.endswith(".com"):
                domain_count[domain] += 1

        result = []

        for domain in sorted(domain_count.keys()):
            result.append({
                "email_domain": domain,
                "count": domain_count[domain]
            })

        return result

The implementation begins by creating a hash map called domain_count. This dictionary stores how many times each valid domain appears.

For every email address, the code splits the string at the @ symbol and extracts the domain portion. The endswith(".com") check ensures only .com domains are counted.

Whenever a valid domain is found, its count is incremented inside the hash map.

After processing all emails, the domains are sorted alphabetically to satisfy the required output ordering. The final result is constructed as a list of dictionaries containing the required column names.

This implementation mirrors the SQL aggregation process closely:

  • Domain extraction corresponds to SUBSTRING_INDEX
  • Filtering corresponds to WHERE
  • Hash map counting corresponds to GROUP BY + COUNT(*)
  • Sorting corresponds to ORDER BY

Go Solution

package main

import (
	"sort"
	"strings"
)

type Result struct {
	EmailDomain string
	Count       int
}

func findUniqueEmailDomains(emails []string) []Result {
	domainCount := make(map[string]int)

	for _, email := range emails {
		parts := strings.Split(email, "@")
		domain := parts[1]

		if strings.HasSuffix(domain, ".com") {
			domainCount[domain]++
		}
	}

	domains := make([]string, 0, len(domainCount))

	for domain := range domainCount {
		domains = append(domains, domain)
	}

	sort.Strings(domains)

	result := make([]Result, 0, len(domains))

	for _, domain := range domains {
		result = append(result, Result{
			EmailDomain: domain,
			Count:       domainCount[domain],
		})
	}

	return result
}

The Go implementation follows the same overall algorithm as the Python version, but uses Go-specific data structures and utilities.

A map[string]int stores domain frequencies efficiently. The strings.Split function extracts the domain, while strings.HasSuffix checks whether the domain ends with .com.

Since Go maps are unordered, we first collect all domains into a slice and then sort them using sort.Strings.

Unlike Python dictionaries, Go requires a dedicated struct type for returning structured results.

Worked Examples

Example 1

Input:

[
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]

Step-by-step Trace

Email Extracted Domain Ends With .com Domain Count State
[email protected] test.edu No {}
[email protected] outlook.com Yes {outlook.com: 1}
[email protected] yahoo.com Yes {outlook.com: 1, yahoo.com: 1}
[email protected] test.edu No {outlook.com: 1, yahoo.com: 1}
[email protected] example.org No {outlook.com: 1, yahoo.com: 1}
[email protected] outlook.com Yes {outlook.com: 2, yahoo.com: 1}

After sorting domains alphabetically:

email_domain count
outlook.com 2
yahoo.com 1

Complexity Analysis

Measure Complexity Explanation
Time O(n log n) Processing emails is O(n), sorting domains adds O(k log k)
Space O(n) Hash map stores domain frequencies

Here, n is the number of emails and k is the number of unique .com domains.

The algorithm scans each email exactly once, making the counting step linear. The additional sorting step depends on the number of unique domains.

The space complexity comes from storing domain counts in the hash map.

Test Cases

from collections import defaultdict

class Solution:
    def findUniqueEmailDomains(self, emails):
        domain_count = defaultdict(int)

        for email in emails:
            domain = email.split("@")[1]

            if domain.endswith(".com"):
                domain_count[domain] += 1

        result = []

        for domain in sorted(domain_count.keys()):
            result.append({
                "email_domain": domain,
                "count": domain_count[domain]
            })

        return result

sol = Solution()

# Basic example from problem statement
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "outlook.com", "count": 2},
    {"email_domain": "yahoo.com", "count": 1}
]

# No .com domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]"
]) == []

# Single .com domain repeated many times
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "gmail.com", "count": 3}
]

# Multiple distinct .com domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "gmail.com", "count": 1},
    {"email_domain": "hotmail.com", "count": 1},
    {"email_domain": "yahoo.com", "count": 1}
]

# Domains with multiple periods
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "mail.google.com", "count": 2}
]

# Mixed valid and invalid domains
assert sol.findUniqueEmailDomains([
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]) == [
    {"email_domain": "test.com", "count": 2}
]
Test Why
Problem statement example Verifies standard behavior
No .com domains Ensures filtering works correctly
Repeated single domain Validates aggregation
Multiple unique domains Verifies grouping and sorting
Multiple periods in domain Ensures extraction logic remains correct
Mixed valid and invalid domains Tests filtering and counting together

Edge Cases

No Valid .com Domains

An input may contain only domains like .edu or .org. A buggy implementation might still include them accidentally if the filtering condition is incorrect.

The implementation handles this safely by explicitly checking:

domain.endswith(".com")

Only matching domains are counted, so the result becomes an empty list.

Multiple Emails Sharing the Same Domain

Many users may belong to the same email provider. A naive implementation could mistakenly overwrite counts instead of incrementing them.

The hash map approach avoids this issue by incrementing existing counts:

domain_count[domain] += 1

This guarantees accurate aggregation.

Domains Containing Multiple Periods

Some domains contain additional subdomains, such as:

mail.google.com

A fragile parser might incorrectly extract only google.com.

The implementation avoids this problem by splitting only at the @ symbol and keeping everything afterward unchanged. This preserves the full domain exactly as required.