LeetCode 721 - Accounts Merge
This problem asks us to merge user accounts based on shared email addresses. Each account is represented as a list of strings. The first string is the user's name, and every remaining string is an email address associated with that account.
Difficulty: š” Medium
Topics: Array, Hash Table, String, Depth-First Search, Breadth-First Search, Union-Find, Sorting
Solution
Problem Understanding
This problem asks us to merge user accounts based on shared email addresses. Each account is represented as a list of strings. The first string is the user's name, and every remaining string is an email address associated with that account.
The important detail is that two accounts belong to the same person if they share at least one email address. This relationship is transitive. For example, if Account A shares an email with Account B, and Account B shares an email with Account C, then all three accounts belong to the same person, even if A and C do not directly share an email.
The output should group all connected accounts together. Each merged account must contain:
- The person's name
- All unique emails belonging to that person
- Emails sorted lexicographically
The order of the final merged accounts does not matter.
The constraints are relatively small:
- At most 1000 accounts
- Each account contains at most 10 strings
- Email strings are short
Even though the constraints are not massive, a naive comparison between every pair of accounts can still become inefficient because we may repeatedly scan overlapping email lists.
A key observation is that accounts form connected components. Each email can be viewed as a node in a graph, and accounts connect emails together. Once we identify connected groups of emails, we can reconstruct merged accounts.
Several edge cases are important:
- Two people may have the same name but be completely unrelated
- One account may contain only one email
- Multiple accounts may chain together indirectly through shared emails
- Duplicate emails across many accounts must appear only once in the output
- The same account name is guaranteed for all merged accounts belonging to the same person
Approaches
Brute Force Approach
A brute force solution would repeatedly compare accounts against one another to see whether they share any email addresses.
We could:
- Compare every pair of accounts
- If two accounts share an email, merge them
- Repeat until no more merges are possible
To determine whether two accounts overlap, we would need to compare their email sets. Since merging one pair may create new overlaps with other accounts, we may need multiple passes.
This approach is correct because eventually all connected accounts become merged into one group. However, it is inefficient because repeated pairwise comparisons create substantial overhead.
If there are N accounts and each account contains up to K emails, pairwise comparisons alone cost approximately O(N^2 * K) or worse after repeated merging passes.
Optimal Approach, Union-Find
The key insight is that this problem is fundamentally about connected components.
If two emails appear in the same account, they belong to the same connected group. Instead of comparing accounts repeatedly, we can directly connect emails using a Union-Find data structure, also called Disjoint Set Union, DSU.
The process becomes:
- Treat each email as a node
- Union all emails within the same account
- After processing all accounts, emails with the same root belong to the same merged account
- Group emails by root
- Sort emails and construct the final answer
Union-Find is ideal here because it efficiently handles dynamic grouping and connectivity queries.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N² à K) or worse | O(N à K) | Repeatedly compares and merges accounts |
| Optimal | O(E à α(E) + E log E) | O(E) | Uses Union-Find to group connected emails efficiently |
Here:
N= number of accountsK= maximum emails per accountE= total number of emailsα(E)= inverse Ackermann function, nearly constant
Algorithm Walkthrough
Step 1: Initialize Union-Find Structures
We create two hash maps:
parent[email]stores the representative parent of each emailemail_to_name[email]stores the owner name for each email
Each email initially becomes its own parent.
We use hash maps because emails are strings, not integer indices.
Step 2: Process Every Account
For each account:
- Extract the account name
- Take the first email as the base email
- Union every other email in the account with the base email
This connects all emails inside the same account into one component.
For example:
["John", "[email protected]", "[email protected]", "[email protected]"]
creates unions:
a <-> b
a <-> c
After these operations, all three emails share the same root.
Step 3: Apply Path Compression During Find
The find() operation recursively locates the root parent of an email.
Path compression optimizes future lookups by directly attaching nodes to the root.
Without path compression, trees can become tall and inefficient.
Step 4: Group Emails by Root
After all unions are complete:
- Iterate through every email
- Find its root representative
- Add the email into a group corresponding to that root
Now each group contains all emails belonging to one merged account.
Step 5: Build Final Answer
For each connected component:
- Sort the emails lexicographically
- Retrieve the account name using any email in the group
- Construct the merged account
The format becomes:
[name, sorted_email_1, sorted_email_2, ...]
Why it works
Union-Find guarantees that all emails connected directly or indirectly belong to the same component. Since every account unions all of its emails together, any shared email causes the connected accounts to merge transitively.
Grouping emails by their final root therefore produces exactly the connected components representing each person.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
parent = {}
email_to_name = {}
def find(email: str) -> str:
if parent[email] != email:
parent[email] = find(parent[email])
return parent[email]
def union(email1: str, email2: str) -> None:
root1 = find(email1)
root2 = find(email2)
if root1 != root2:
parent[root2] = root1
# Initialize parent pointers and map emails to names
for account in accounts:
name = account[0]
for email in account[1:]:
if email not in parent:
parent[email] = email
email_to_name[email] = name
# Union emails within the same account
for account in accounts:
first_email = account[1]
for email in account[2:]:
union(first_email, email)
# Group emails by root parent
groups = defaultdict(list)
for email in parent:
root = find(email)
groups[root].append(email)
# Build final result
result = []
for root_email, emails in groups.items():
name = email_to_name[root_email]
merged_account = [name] + sorted(emails)
result.append(merged_account)
return result
The implementation begins by initializing two dictionaries. The parent dictionary stores the Union-Find structure, while email_to_name remembers which user name belongs to each email.
The find() function implements path compression. Whenever we search for a root, we flatten the structure so future searches become faster.
The union() function merges two connected email groups by attaching one root to another.
The first loop initializes every email as its own parent and records the associated account name.
The second loop unions all emails within the same account using the first email as the representative anchor.
After all unions are complete, we iterate through every email and group them by root parent using a defaultdict(list).
Finally, we sort each group's emails and prepend the account name to create the final merged account structure.
Go Solution
package main
import (
"sort"
)
func accountsMerge(accounts [][]string) [][]string {
parent := make(map[string]string)
emailToName := make(map[string]string)
var find func(string) string
find = func(email string) string {
if parent[email] != email {
parent[email] = find(parent[email])
}
return parent[email]
}
union := func(email1, email2 string) {
root1 := find(email1)
root2 := find(email2)
if root1 != root2 {
parent[root2] = root1
}
}
// Initialize parent pointers
for _, account := range accounts {
name := account[0]
for i := 1; i < len(account); i++ {
email := account[i]
if _, exists := parent[email]; !exists {
parent[email] = email
}
emailToName[email] = name
}
}
// Union emails within the same account
for _, account := range accounts {
firstEmail := account[1]
for i := 2; i < len(account); i++ {
union(firstEmail, account[i])
}
}
// Group emails by root
groups := make(map[string][]string)
for email := range parent {
root := find(email)
groups[root] = append(groups[root], email)
}
// Build result
result := [][]string{}
for root, emails := range groups {
sort.Strings(emails)
account := []string{emailToName[root]}
account = append(account, emails...)
result = append(result, account)
}
return result
}
The Go implementation follows the same logic as the Python version, but uses Go maps and slices.
Go does not support nested named functions in the same way as Python, so the find function is declared as a variable first and then assigned recursively.
Slices are dynamically appended during grouping and result construction. Since Go maps return zero values for missing keys, explicit existence checks are used during initialization.
Integer overflow is not a concern because the problem sizes are small and no arithmetic grows significantly.
Worked Examples
Example 1
Input:
[
["John","[email protected]","[email protected]"],
["John","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]
]
Step 1: Initialize Parents
| Parent | |
|---|---|
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
Step 2: Perform Unions
From first account:
union(johnsmith, john_newyork)
Parent structure:
| Parent | |
|---|---|
| [email protected] | [email protected] |
From second account:
union(johnsmith, john00)
Updated structure:
| Parent | |
|---|---|
| [email protected] | [email protected] |
Third and fourth accounts contain only one email, so no unions occur.
Step 3: Group by Root
| Root | Emails |
|---|---|
| [email protected] | [email protected], [email protected], [email protected] |
| [email protected] | [email protected] |
| [email protected] | [email protected] |
Step 4: Sort and Build Result
Output:
[
["John","[email protected]","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]
]
Example 2
Input:
[
["Gabe","[email protected]","[email protected]","[email protected]"],
["Kevin","[email protected]","[email protected]","[email protected]"],
["Ethan","[email protected]","[email protected]","[email protected]"],
["Hanzo","[email protected]","[email protected]","[email protected]"],
["Fern","[email protected]","[email protected]","[email protected]"]
]
No accounts share emails, so every account remains independent.
Union Operations
| Account | Unions |
|---|---|
| Gabe | Gabe0 ā Gabe3, Gabe0 ā Gabe1 |
| Kevin | Kevin3 ā Kevin5, Kevin3 ā Kevin0 |
| Ethan | Ethan5 ā Ethan4, Ethan5 ā Ethan0 |
| Hanzo | Hanzo3 ā Hanzo1, Hanzo3 ā Hanzo0 |
| Fern | Fern5 ā Fern1, Fern5 ā Fern0 |
Each connected component contains only emails from its own account.
Final output contains five separate merged accounts.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(E à α(E) + E log E) | Union-Find operations are nearly constant, sorting dominates |
| Space | O(E) | Parent map, grouping map, and result storage |
Here, E represents the total number of emails across all accounts.
Union and find operations are effectively constant time because of path compression. The main additional cost comes from sorting emails within each merged component. Across all groups, sorting contributes up to O(E log E) total time.
The space complexity is linear because we store parent pointers, mappings, grouped emails, and the final output.
Test Cases
from typing import List
def normalize(result: List[List[str]]) -> List[List[str]]:
return sorted([ [x[0]] + sorted(x[1:]) for x in result ])
solution = Solution()
# Example 1
accounts = [
["John","[email protected]","[email protected]"],
["John","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]
]
expected = [
["John","[email protected]","[email protected]","[email protected]"],
["Mary","[email protected]"],
["John","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # shared email merge
# Example 2
accounts = [
["Gabe","[email protected]","[email protected]","[email protected]"],
["Kevin","[email protected]","[email protected]","[email protected]"],
["Ethan","[email protected]","[email protected]","[email protected]"],
["Hanzo","[email protected]","[email protected]","[email protected]"],
["Fern","[email protected]","[email protected]","[email protected]"]
]
expected = [
["Ethan","[email protected]","[email protected]","[email protected]"],
["Gabe","[email protected]","[email protected]","[email protected]"],
["Hanzo","[email protected]","[email protected]","[email protected]"],
["Kevin","[email protected]","[email protected]","[email protected]"],
["Fern","[email protected]","[email protected]","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # no merges
# Single account
accounts = [
["Alice","[email protected]"]
]
expected = [
["Alice","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # minimal input
# Indirect transitive merge
accounts = [
["A","[email protected]","[email protected]"],
["A","[email protected]","[email protected]"],
["A","[email protected]","[email protected]"]
]
expected = [
["A","[email protected]","[email protected]","[email protected]","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # transitive connectivity
# Same name but different people
accounts = [
["John","[email protected]"],
["John","[email protected]"]
]
expected = [
["John","[email protected]"],
["John","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # same name without overlap
# Multiple disconnected groups
accounts = [
["A","[email protected]","[email protected]"],
["B","[email protected]","[email protected]"],
["A","[email protected]","[email protected]"]
]
expected = [
["A","[email protected]","[email protected]","[email protected]"],
["B","[email protected]","[email protected]"]
]
assert normalize(solution.accountsMerge(accounts)) == normalize(expected) # multiple components
| Test | Why |
|---|---|
| Example 1 | Validates basic merging with shared emails |
| Example 2 | Validates independent accounts remain separate |
| Single account | Tests smallest valid input |
| Indirect transitive merge | Ensures transitive connectivity works correctly |
| Same name but different people | Verifies names alone do not cause merges |
| Multiple disconnected groups | Tests handling of multiple connected components |
Edge Cases
Same Name, Different People
Two unrelated users may share the same account name. A naive solution might incorrectly merge them based only on the name.
For example:
["John", "[email protected]"]
["John", "[email protected]"]
These accounts must remain separate because they share no email addresses. The implementation correctly avoids this issue because connectivity is based entirely on emails, not names.
Transitive Connections
Accounts may connect indirectly through intermediate accounts.
For example:
A shares with B
B shares with C
Even if A and C do not directly overlap, they still belong to the same person.
Union-Find naturally handles this because all connected emails eventually share the same root representative.
Single Email Accounts
Some accounts contain only one email address. In these cases, no union operations occur.
A buggy implementation might accidentally skip these accounts during grouping because no merges happen.
This solution correctly initializes every email in the parent map before unions begin, ensuring isolated emails still appear in the final result.