LeetCode 929 - Unique Email Addresses

The problem gives us a list of email addresses and asks how many unique destinations actually receive emails after applying Gmail-like normalization rules.

LeetCode Problem 929

Difficulty: 🟢 Easy
Topics: Array, Hash Table, String

Solution

Problem Understanding

The problem gives us a list of email addresses and asks how many unique destinations actually receive emails after applying Gmail-like normalization rules.

Each email consists of two parts separated by the '@' symbol:

  • The local name, which appears before '@'
  • The domain name, which appears after '@'

The domain name is always preserved exactly as written. The transformations only apply to the local name.

There are two special rules:

  1. Periods '.' inside the local name are ignored.
  2. If a plus sign '+' appears in the local name, everything after the first '+' is ignored.

For example:

The task is to normalize every email according to these rules and then count how many distinct normalized email addresses remain.

The input is an array of email strings. The output is a single integer representing the number of unique normalized addresses.

The constraints are small:

  • At most 100 emails
  • Each email length is at most 100 characters

These limits tell us that performance is not a major concern. Even relatively inefficient solutions would work within limits. However, the problem is mainly about careful string manipulation and correct rule handling.

Several edge cases are important:

  • Emails may contain both '.' and '+'
  • Multiple periods may appear in the local name
  • The plus sign only affects the local name
  • The domain must remain unchanged
  • Different domains are always considered different addresses, even if local names normalize identically
  • There is guaranteed to be exactly one '@' character
  • Local and domain parts are guaranteed to be non-empty

A naive implementation can easily make mistakes by accidentally modifying the domain name or by processing periods after the plus sign incorrectly.

Approaches

Brute Force Approach

A brute-force solution would compare every normalized email against every other normalized email manually.

For each email, we would:

  1. Normalize it according to the rules
  2. Compare it against all previously processed normalized emails
  3. Only add it if it does not already exist

This works because every email is converted into its canonical form before comparison. If two emails normalize to the same string, they represent the same actual mailbox.

However, this approach becomes inefficient because every new email may require scanning all previously processed emails. With n emails, this leads to quadratic behavior.

Although the constraints are small enough that this would still pass, it is unnecessary work.

Optimal Approach

The key observation is that once an email is normalized, uniqueness checking becomes a standard hash set problem.

A hash set allows:

  • Fast insertion
  • Fast duplicate detection
  • Constant average-time membership checks

The algorithm becomes:

  1. Normalize each email
  2. Insert the normalized version into a set
  3. Return the size of the set

This is efficient because sets automatically eliminate duplicates.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n² × m) O(n × m) Compares each normalized email with all previous ones
Optimal O(n × m) O(n × m) Uses a hash set for constant-time uniqueness checking

Here:

  • n is the number of emails
  • m is the maximum email length

Algorithm Walkthrough

Optimal Algorithm

  1. Create an empty hash set to store normalized email addresses.

A hash set is ideal because it automatically removes duplicates and provides fast lookup and insertion operations. 2. Iterate through every email in the input array.

Each email must be processed independently because normalization rules apply per address. 3. Split the email into local name and domain name using '@'.

For example:

"[email protected]"

becomes:

local = "test.email+alex"
domain = "leetcode.com"
  1. Process the local name.

First, check whether the local name contains '+'.

If it does, keep only the substring before the first '+'.

Example:

"test.email+alex"

becomes:

"test.email"
  1. Remove all periods '.' from the remaining local name.

Example:

"test.email"

becomes:

"testemail"
  1. Reconstruct the normalized email.

Combine the cleaned local name with the unchanged domain:

"[email protected]"
  1. Insert the normalized email into the hash set.

Duplicate normalized emails will automatically collapse into one entry. 8. After processing all emails, return the size of the hash set.

The set size equals the number of unique receiving addresses.

Why it works

The algorithm works because every email is transformed into a canonical representation according to the exact rules in the problem statement.

Two emails represent the same actual mailbox if and only if their normalized forms are identical. By storing normalized emails in a hash set, duplicates are automatically removed. Therefore, the final set size is exactly the number of unique addresses that receive mail.

Python Solution

from typing import List

class Solution:
    def numUniqueEmails(self, emails: List[str]) -> int:
        unique_emails = set()

        for email in emails:
            local_name, domain_name = email.split("@")

            if "+" in local_name:
                local_name = local_name.split("+")[0]

            local_name = local_name.replace(".", "")

            normalized_email = local_name + "@" + domain_name

            unique_emails.add(normalized_email)

        return len(unique_emails)

The implementation closely follows the algorithm described earlier.

The solution begins by creating a set called unique_emails. This set stores every normalized email address encountered during processing.

For each email, the code splits the string into two parts using split("@"). The left side becomes the local name, while the right side becomes the domain name.

The plus rule is handled first. If the local name contains '+', the code keeps only the portion before the first plus sign. This is done using:

local_name.split("+")[0]

Next, all periods are removed from the remaining local name using:

local_name.replace(".", "")

The normalized local name is then combined with the unchanged domain name to form the final canonical email address.

This normalized email is inserted into the set. Since sets only keep unique values, duplicates are discarded automatically.

Finally, the function returns the size of the set, which equals the number of unique receiving addresses.

Go Solution

package main

import "strings"

func numUniqueEmails(emails []string) int {
	uniqueEmails := make(map[string]bool)

	for _, email := range emails {
		parts := strings.Split(email, "@")

		localName := parts[0]
		domainName := parts[1]

		if plusIndex := strings.Index(localName, "+"); plusIndex != -1 {
			localName = localName[:plusIndex]
		}

		localName = strings.ReplaceAll(localName, ".", "")

		normalizedEmail := localName + "@" + domainName

		uniqueEmails[normalizedEmail] = true
	}

	return len(uniqueEmails)
}

The Go solution uses a map as a hash set equivalent because Go does not provide a built-in set type.

The map:

map[string]bool

stores normalized email strings as keys. The boolean values are not important, only key existence matters.

String processing differs slightly from Python:

  • strings.Split separates local and domain parts
  • strings.Index finds the first '+'
  • strings.ReplaceAll removes periods

Go slices strings using index ranges:

localName[:plusIndex]

This extracts everything before the plus sign.

No special handling for nil or empty input is required because the constraints guarantee at least one email.

Worked Examples

Example 1

Input:

[
  "[email protected]",
  "[email protected]",
  "[email protected]"
]

Step-by-step Trace

Original Email Local Name After '+' Rule After '.' Rule Domain Normalized Email Set State
[email protected] test.email+alex test.email testemail leetcode.com [email protected] {[email protected]}
[email protected] test.e.mail+bob.cathy test.e.mail testemail leetcode.com [email protected] {[email protected]}
[email protected] testemail+david testemail testemail lee.tcode.com [email protected] {[email protected], [email protected]}

Final set size:

2

Example 2

Input:

[
  "[email protected]",
  "[email protected]",
  "[email protected]"
]

Step-by-step Trace

Original Email Normalized Email Set State
[email protected] [email protected] {[email protected]}
[email protected] [email protected] {[email protected], [email protected]}
[email protected] [email protected] {[email protected], [email protected], [email protected]}

Final set size:

3

Complexity Analysis

Measure Complexity Explanation
Time O(n × m) Each email is scanned and normalized once
Space O(n × m) The set may store all normalized emails

Here:

  • n is the number of emails
  • m is the maximum email length

Each email requires processing characters in the local name and constructing a normalized string. String operations such as splitting and replacing periods take linear time relative to email length. Since every email is processed once, the total time complexity is O(n × m).

The space complexity is also O(n × m) because, in the worst case, every normalized email is unique and stored in the hash set.

Test Cases

from typing import List

class Solution:
    def numUniqueEmails(self, emails: List[str]) -> int:
        unique_emails = set()

        for email in emails:
            local_name, domain_name = email.split("@")

            if "+" in local_name:
                local_name = local_name.split("+")[0]

            local_name = local_