LeetCode 3860 - Unique Email Groups
This problem asks us to determine how many distinct email groups exist after applying a specific normalization process to every email address. Each email consists of two parts separated by the '@' character: - The local name appears before '@'.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String
Solution
LeetCode 3860 - Unique Email Groups
Problem Understanding
This problem asks us to determine how many distinct email groups exist after applying a specific normalization process to every email address.
Each email consists of two parts separated by the '@' character:
- The local name appears before
'@'. - The domain name appears after
'@'.
Before comparing emails, we must normalize both parts according to the given rules.
For the local name:
- Remove all
'.'characters. - If a
'+'exists, ignore the'+'and everything after it. - Convert the result to lowercase.
For the domain name:
- Convert it to lowercase.
After normalization, two emails belong to the same group if their normalized local names and normalized domain names are exactly identical.
The input is an array of email strings, and the output is the number of unique normalized email addresses.
The constraints are relatively small:
- Up to 1000 emails.
- Each email length is at most 100 characters.
Because the input size is modest, we do not need sophisticated optimization techniques. A linear scan combined with a hash set is more than sufficient.
Several edge cases are important:
- Emails differing only by letter case should normalize to the same address.
- Multiple dots in the local name should all be removed.
- A plus sign may appear in the local name, and everything after the first plus sign must be ignored.
- Emails without a plus sign should keep the entire local name.
- Different domains must remain distinct even if the local names normalize identically.
- The problem guarantees exactly one
'@'in every email and non-empty local and domain parts, so validation is unnecessary.
Approaches
Brute Force
A straightforward approach is to normalize every email and store the resulting normalized strings in a list. Afterward, compare every normalized email against every other normalized email to count distinct values.
This works because normalization completely determines group membership. If two normalized emails are identical, they belong to the same group. Otherwise, they belong to different groups.
However, duplicate detection would require repeated comparisons among all normalized emails. With n emails, this leads to quadratic behavior.
Although the constraints are small enough that such a solution might pass, it is unnecessary because hash-based lookup provides a more efficient way to detect duplicates.
Key Insight
The key observation is that once an email is normalized, group membership is represented by a single string:
normalized_local + "@" + normalized_domain
We only need to know how many distinct normalized emails exist.
A hash set is ideal because:
- Insertion is average-case O(1).
- Duplicate detection is automatic.
- The final set size equals the number of unique groups.
Thus, we can process each email exactly once, normalize it, insert it into a set, and return the set size.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²) | O(n) | Normalize all emails, then compare every pair to identify unique groups |
| Optimal | O(n × m) | O(n) | Normalize each email once and store it in a hash set |
Here, n is the number of emails and m is the maximum email length.
Algorithm Walkthrough
- Create an empty hash set to store normalized email addresses.
- Iterate through every email in the input array.
- Split the email into its local name and domain name using the
'@'character. - Convert the domain name to lowercase because domain comparisons are case-insensitive.
- Convert the local name to lowercase.
- Find the first
'+'in the local name. If one exists, keep only the substring before it because everything after the first plus sign must be ignored. - Remove all dots from the remaining local name.
- Construct the normalized email string by combining the normalized local name,
'@', and normalized domain name. - Insert the normalized email into the hash set.
- After processing all emails, return the size of the hash set.
Why it works
The normalization rules uniquely define how every email should be transformed. Two emails belong to the same group if and only if their normalized local names and normalized domain names are identical. By storing exactly these normalized representations in a hash set, duplicates automatically collapse into a single entry. Therefore, the number of elements in the set is precisely the number of unique email groups.
Python Solution
class Solution:
def uniqueEmailGroups(self, emails: list[str]) -> int:
unique_emails = set()
for email in emails:
local, domain = email.split("@")
local = local.lower()
domain = domain.lower()
plus_index = local.find("+")
if plus_index != -1:
local = local[:plus_index]
local = local.replace(".", "")
normalized_email = f"{local}@{domain}"
unique_emails.add(normalized_email)
return len(unique_emails)
The implementation follows the algorithm directly.
A set named unique_emails stores all normalized email addresses. For each email, the code splits the string into local and domain portions. Both parts are converted to lowercase because normalization requires case-insensitive comparison.
The local name is then truncated at the first plus sign if one exists. After that, all dots are removed. The normalized local name and normalized domain name are combined into a single string representation.
Each normalized email is inserted into the set. Since sets automatically eliminate duplicates, the final set size equals the number of unique email groups.
Go Solution
import "strings"
func uniqueEmailGroups(emails []string) int {
uniqueEmails := make(map[string]struct{})
for _, email := range emails {
parts := strings.Split(email, "@")
local := strings.ToLower(parts[0])
domain := strings.ToLower(parts[1])
if idx := strings.Index(local, "+"); idx != -1 {
local = local[:idx]
}
local = strings.ReplaceAll(local, ".", "")
normalized := local + "@" + domain
uniqueEmails[normalized] = struct{}{}
}
return len(uniqueEmails)
}
The Go implementation uses a map[string]struct{} to simulate a hash set. The empty struct occupies zero storage, making it a common idiom for set-like behavior in Go.
String operations are performed using the strings package. Since the problem constraints are small, there are no concerns regarding memory usage or integer overflow. The behavior matches the Python solution exactly.
Worked Examples
Example 1
Input:
[
"[email protected]",
"[email protected]",
"[email protected]"
]
| Normalized Local | Normalized Domain | Final Normalized Email | Set After Insert | |
|---|---|---|---|---|
| [email protected] | testemail | leetcode.com | [email protected] | {[email protected]} |
| [email protected] | testemail | leetcode.com | [email protected] | {[email protected]} |
| [email protected] | testemail | lee.tcode.com | [email protected] | {[email protected], [email protected]} |
Final answer:
2
Example 2
Input:
[
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]"
]
| Normalized Email | |
|---|---|
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
Set evolution:
| Step | Set Contents |
|---|---|
| After [email protected] | {[email protected]} |
| After [email protected] | {[email protected]} |
| After [email protected] | {[email protected], [email protected]} |
| After [email protected] | {[email protected], [email protected]} |
Final answer:
2
Example 3
Input:
[
"[email protected]",
"[email protected]",
"[email protected]"
]
| Normalized Email | |
|---|---|
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
Set evolution:
| Step | Set Contents |
|---|---|
| First email | {[email protected]} |
| Second email | {[email protected]} |
| Third email | {[email protected]} |
Final answer:
1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n × m) | Each email is processed once, and normalization scans at most the email length |
| Space | O(n) | In the worst case every normalized email is unique and stored in the set |
The normalization process involves splitting, lowercasing, removing dots, and possibly truncating at a plus sign. Each operation is linear in the length of the email. Since each email is processed exactly once, the total running time is O(n × m), where m is the maximum email length. The hash set may store all normalized emails, requiring O(n) space.
Test Cases
sol = Solution()
assert sol.uniqueEmailGroups(
[
"[email protected]",
"[email protected]",
"[email protected]",
]
) == 2 # Example 1
assert sol.uniqueEmailGroups(
[
"[email protected]",
"[email protected]",
"[email protected]",
"[email protected]",
]
) == 2 # Example 2
assert sol.uniqueEmailGroups(
[
"[email protected]",
"[email protected]",
"[email protected]",
]
) == 1 # Example 3
assert sol.uniqueEmailGroups(
["[email protected]"]
) == 1 # Single email
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Case insensitivity
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Dot removal
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Plus handling
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 2 # Different domains remain distinct
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Different suffixes after plus ignored
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Many dots removed
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]"]
) == 1 # Combined normalization rules
assert sol.uniqueEmailGroups(
["[email protected]", "[email protected]", "[email protected]"]
) == 3 # All unique
| Test | Why |
|---|---|
| Example 1 | Validates duplicate local normalization and distinct domains |
| Example 2 | Validates case conversion, dot removal, and plus handling |
| Example 3 | Validates all emails collapsing to one group |
| Single email | Minimum valid input |
| [email protected] vs [email protected] | Case-insensitive comparison |
| a.b.c vs abc | Dot removal correctness |
| abc+test vs abc | Plus-sign truncation |
| Same local, different domains | Domain remains part of identity |
| a+b vs a+c | Different plus suffixes ignored |
| Many dots | Repeated dot removal |
| Uppercase with dots and plus | Combined normalization behavior |
| Completely distinct locals | Multiple unique groups |
Edge Cases
Emails Differing Only by Case
A common mistake is forgetting that both the local name and domain name must be converted to lowercase. For example, "[email protected]" and "[email protected]" should belong to the same group. The implementation explicitly applies .lower() in Python and strings.ToLower() in Go before any comparison occurs.
Multiple Dots in the Local Name
The normalization rules state that all dots in the local name should be ignored. An implementation that removes only the first dot would produce incorrect results. For example, "[email protected]" and "[email protected]" must normalize identically. The solution removes every dot using replace(".", "") in Python and strings.ReplaceAll() in Go.
Multiple Plus Signs
The problem requires ignoring everything after the first plus sign. Consider "[email protected]". The correct normalized local name is "ab", not "abcd". The implementation finds the first plus sign and truncates the string immediately, ensuring all subsequent characters are discarded.
Identical Local Names Across Different Domains
Even when local names normalize to the same value, domains still matter. For example, "[email protected]" and "[email protected]" belong to different groups. The solution constructs a normalized email containing both the local and domain portions, ensuring domain distinctions are preserved.
All Emails Normalize to the Same Address
In some inputs, every email may collapse into a single normalized representation. The hash set naturally handles this situation because duplicate insertions do not increase the set size. Therefore, the algorithm correctly returns 1 when all emails belong to the same group.