LeetCode 3860 - Unique Email Groups

This problem asks us to determine how many distinct email groups exist after applying a specific normalization process to every email address. Each email consists of two parts separated by the '@' character: - The local name appears before '@'.

LeetCode Problem 3860

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String

Solution

LeetCode 3860 - Unique Email Groups

Problem Understanding

This problem asks us to determine how many distinct email groups exist after applying a specific normalization process to every email address.

Each email consists of two parts separated by the '@' character:

  • The local name appears before '@'.
  • The domain name appears after '@'.

Before comparing emails, we must normalize both parts according to the given rules.

For the local name:

  • Remove all '.' characters.
  • If a '+' exists, ignore the '+' and everything after it.
  • Convert the result to lowercase.

For the domain name:

  • Convert it to lowercase.

After normalization, two emails belong to the same group if their normalized local names and normalized domain names are exactly identical.

The input is an array of email strings, and the output is the number of unique normalized email addresses.

The constraints are relatively small:

  • Up to 1000 emails.
  • Each email length is at most 100 characters.

Because the input size is modest, we do not need sophisticated optimization techniques. A linear scan combined with a hash set is more than sufficient.

Several edge cases are important:

  • Emails differing only by letter case should normalize to the same address.
  • Multiple dots in the local name should all be removed.
  • A plus sign may appear in the local name, and everything after the first plus sign must be ignored.
  • Emails without a plus sign should keep the entire local name.
  • Different domains must remain distinct even if the local names normalize identically.
  • The problem guarantees exactly one '@' in every email and non-empty local and domain parts, so validation is unnecessary.

Approaches

Brute Force

A straightforward approach is to normalize every email and store the resulting normalized strings in a list. Afterward, compare every normalized email against every other normalized email to count distinct values.

This works because normalization completely determines group membership. If two normalized emails are identical, they belong to the same group. Otherwise, they belong to different groups.

However, duplicate detection would require repeated comparisons among all normalized emails. With n emails, this leads to quadratic behavior.

Although the constraints are small enough that such a solution might pass, it is unnecessary because hash-based lookup provides a more efficient way to detect duplicates.

Key Insight

The key observation is that once an email is normalized, group membership is represented by a single string:

normalized_local + "@" + normalized_domain

We only need to know how many distinct normalized emails exist.

A hash set is ideal because:

  • Insertion is average-case O(1).
  • Duplicate detection is automatic.
  • The final set size equals the number of unique groups.

Thus, we can process each email exactly once, normalize it, insert it into a set, and return the set size.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Normalize all emails, then compare every pair to identify unique groups
Optimal O(n × m) O(n) Normalize each email once and store it in a hash set

Here, n is the number of emails and m is the maximum email length.

Algorithm Walkthrough

  1. Create an empty hash set to store normalized email addresses.
  2. Iterate through every email in the input array.
  3. Split the email into its local name and domain name using the '@' character.
  4. Convert the domain name to lowercase because domain comparisons are case-insensitive.
  5. Convert the local name to lowercase.
  6. Find the first '+' in the local name. If one exists, keep only the substring before it because everything after the first plus sign must be ignored.
  7. Remove all dots from the remaining local name.
  8. Construct the normalized email string by combining the normalized local name, '@', and normalized domain name.
  9. Insert the normalized email into the hash set.
  10. After processing all emails, return the size of the hash set.

Why it works

The normalization rules uniquely define how every email should be transformed. Two emails belong to the same group if and only if their normalized local names and normalized domain names are identical. By storing exactly these normalized representations in a hash set, duplicates automatically collapse into a single entry. Therefore, the number of elements in the set is precisely the number of unique email groups.

Python Solution

class Solution:
    def uniqueEmailGroups(self, emails: list[str]) -> int:
        unique_emails = set()

        for email in emails:
            local, domain = email.split("@")

            local = local.lower()
            domain = domain.lower()

            plus_index = local.find("+")
            if plus_index != -1:
                local = local[:plus_index]

            local = local.replace(".", "")

            normalized_email = f"{local}@{domain}"
            unique_emails.add(normalized_email)

        return len(unique_emails)

The implementation follows the algorithm directly.

A set named unique_emails stores all normalized email addresses. For each email, the code splits the string into local and domain portions. Both parts are converted to lowercase because normalization requires case-insensitive comparison.

The local name is then truncated at the first plus sign if one exists. After that, all dots are removed. The normalized local name and normalized domain name are combined into a single string representation.

Each normalized email is inserted into the set. Since sets automatically eliminate duplicates, the final set size equals the number of unique email groups.

Go Solution

import "strings"

func uniqueEmailGroups(emails []string) int {
	uniqueEmails := make(map[string]struct{})

	for _, email := range emails {
		parts := strings.Split(email, "@")

		local := strings.ToLower(parts[0])
		domain := strings.ToLower(parts[1])

		if idx := strings.Index(local, "+"); idx != -1 {
			local = local[:idx]
		}

		local = strings.ReplaceAll(local, ".", "")

		normalized := local + "@" + domain
		uniqueEmails[normalized] = struct{}{}
	}

	return len(uniqueEmails)
}

The Go implementation uses a map[string]struct{} to simulate a hash set. The empty struct occupies zero storage, making it a common idiom for set-like behavior in Go.

String operations are performed using the strings package. Since the problem constraints are small, there are no concerns regarding memory usage or integer overflow. The behavior matches the Python solution exactly.

Worked Examples

Example 1

Input:

[
    "[email protected]",
    "[email protected]",
    "[email protected]"
]
Email Normalized Local Normalized Domain Final Normalized Email Set After Insert
[email protected] testemail leetcode.com [email protected] {[email protected]}
[email protected] testemail leetcode.com [email protected] {[email protected]}
[email protected] testemail lee.tcode.com [email protected] {[email protected], [email protected]}

Final answer:

2

Example 2

Input:

[
    "[email protected]",
    "[email protected]",
    "[email protected]",
    "[email protected]"
]
Email Normalized Email
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]

Set evolution:

Step Set Contents
After [email protected] {[email protected]}
After [email protected] {[email protected]}
After [email protected] {[email protected], [email protected]}
After [email protected] {[email protected], [email protected]}

Final answer:

2

Example 3

Input:

[
    "[email protected]",
    "[email protected]",
    "[email protected]"
]
Email Normalized Email
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]

Set evolution:

Step Set Contents
First email {[email protected]}
Second email {[email protected]}
Third email {[email protected]}

Final answer:

1

Complexity Analysis

Measure Complexity Explanation
Time O(n × m) Each email is processed once, and normalization scans at most the email length
Space O(n) In the worst case every normalized email is unique and stored in the set

The normalization process involves splitting, lowercasing, removing dots, and possibly truncating at a plus sign. Each operation is linear in the length of the email. Since each email is processed exactly once, the total running time is O(n × m), where m is the maximum email length. The hash set may store all normalized emails, requiring O(n) space.

Test Cases

sol = Solution()

assert sol.uniqueEmailGroups(
    [
        "[email protected]",
        "[email protected]",
        "[email protected]",
    ]
) == 2  # Example 1

assert sol.uniqueEmailGroups(
    [
        "[email protected]",
        "[email protected]",
        "[email protected]",
        "[email protected]",
    ]
) == 2  # Example 2

assert sol.uniqueEmailGroups(
    [
        "[email protected]",
        "[email protected]",
        "[email protected]",
    ]
) == 1  # Example 3

assert sol.uniqueEmailGroups(
    ["[email protected]"]
) == 1  # Single email

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Case insensitivity

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Dot removal

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Plus handling

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 2  # Different domains remain distinct

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Different suffixes after plus ignored

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Many dots removed

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]"]
) == 1  # Combined normalization rules

assert sol.uniqueEmailGroups(
    ["[email protected]", "[email protected]", "[email protected]"]
) == 3  # All unique
Test Why
Example 1 Validates duplicate local normalization and distinct domains
Example 2 Validates case conversion, dot removal, and plus handling
Example 3 Validates all emails collapsing to one group
Single email Minimum valid input
[email protected] vs [email protected] Case-insensitive comparison
a.b.c vs abc Dot removal correctness
abc+test vs abc Plus-sign truncation
Same local, different domains Domain remains part of identity
a+b vs a+c Different plus suffixes ignored
Many dots Repeated dot removal
Uppercase with dots and plus Combined normalization behavior
Completely distinct locals Multiple unique groups

Edge Cases

Emails Differing Only by Case

A common mistake is forgetting that both the local name and domain name must be converted to lowercase. For example, "[email protected]" and "[email protected]" should belong to the same group. The implementation explicitly applies .lower() in Python and strings.ToLower() in Go before any comparison occurs.

Multiple Dots in the Local Name

The normalization rules state that all dots in the local name should be ignored. An implementation that removes only the first dot would produce incorrect results. For example, "[email protected]" and "[email protected]" must normalize identically. The solution removes every dot using replace(".", "") in Python and strings.ReplaceAll() in Go.

Multiple Plus Signs

The problem requires ignoring everything after the first plus sign. Consider "[email protected]". The correct normalized local name is "ab", not "abcd". The implementation finds the first plus sign and truncates the string immediately, ensuring all subsequent characters are discarded.

Identical Local Names Across Different Domains

Even when local names normalize to the same value, domains still matter. For example, "[email protected]" and "[email protected]" belong to different groups. The solution constructs a normalized email containing both the local and domain portions, ensuring domain distinctions are preserved.

All Emails Normalize to the Same Address

In some inputs, every email may collapse into a single normalized representation. The hash set naturally handles this situation because duplicate insertions do not increase the set size. Therefore, the algorithm correctly returns 1 when all emails belong to the same group.