LeetCode 721 - Accounts Merge

This problem asks us to merge user accounts based on shared email addresses. Each account is represented as a list of strings. The first string is the user's name, and every remaining string is an email address associated with that account.

LeetCode Problem 721

Difficulty: 🟔 Medium
Topics: Array, Hash Table, String, Depth-First Search, Breadth-First Search, Union-Find, Sorting

Solution

Problem Understanding

This problem asks us to merge user accounts based on shared email addresses. Each account is represented as a list of strings. The first string is the user's name, and every remaining string is an email address associated with that account.

The important detail is that two accounts belong to the same person if they share at least one email address. This relationship is transitive. For example, if Account A shares an email with Account B, and Account B shares an email with Account C, then all three accounts belong to the same person, even if A and C do not directly share an email.

The output should group all connected accounts together. Each merged account must contain:

  1. The person's name
  2. All unique emails belonging to that person
  3. Emails sorted lexicographically

The order of the final merged accounts does not matter.

The constraints are relatively small:

  • At most 1000 accounts
  • Each account contains at most 10 strings
  • Email strings are short

Even though the constraints are not massive, a naive comparison between every pair of accounts can still become inefficient because we may repeatedly scan overlapping email lists.

A key observation is that accounts form connected components. Each email can be viewed as a node in a graph, and accounts connect emails together. Once we identify connected groups of emails, we can reconstruct merged accounts.

Several edge cases are important:

  • Two people may have the same name but be completely unrelated
  • One account may contain only one email
  • Multiple accounts may chain together indirectly through shared emails
  • Duplicate emails across many accounts must appear only once in the output
  • The same account name is guaranteed for all merged accounts belonging to the same person

Approaches

Brute Force Approach

A brute force solution would repeatedly compare accounts against one another to see whether they share any email addresses.

We could:

  1. Compare every pair of accounts
  2. If two accounts share an email, merge them
  3. Repeat until no more merges are possible

To determine whether two accounts overlap, we would need to compare their email sets. Since merging one pair may create new overlaps with other accounts, we may need multiple passes.

This approach is correct because eventually all connected accounts become merged into one group. However, it is inefficient because repeated pairwise comparisons create substantial overhead.

If there are N accounts and each account contains up to K emails, pairwise comparisons alone cost approximately O(N^2 * K) or worse after repeated merging passes.

Optimal Approach, Union-Find

The key insight is that this problem is fundamentally about connected components.

If two emails appear in the same account, they belong to the same connected group. Instead of comparing accounts repeatedly, we can directly connect emails using a Union-Find data structure, also called Disjoint Set Union, DSU.

The process becomes:

  1. Treat each email as a node
  2. Union all emails within the same account
  3. After processing all accounts, emails with the same root belong to the same merged account
  4. Group emails by root
  5. Sort emails and construct the final answer

Union-Find is ideal here because it efficiently handles dynamic grouping and connectivity queries.

Approach Time Complexity Space Complexity Notes
Brute Force O(N² Ɨ K) or worse O(N Ɨ K) Repeatedly compares and merges accounts
Optimal O(E Ɨ α(E) + E log E) O(E) Uses Union-Find to group connected emails efficiently

Here:

  • N = number of accounts
  • K = maximum emails per account
  • E = total number of emails
  • α(E) = inverse Ackermann function, nearly constant

Algorithm Walkthrough

Step 1: Initialize Union-Find Structures

We create two hash maps:

  • parent[email] stores the representative parent of each email
  • email_to_name[email] stores the owner name for each email

Each email initially becomes its own parent.

We use hash maps because emails are strings, not integer indices.

Step 2: Process Every Account

For each account:

  1. Extract the account name
  2. Take the first email as the base email
  3. Union every other email in the account with the base email

This connects all emails inside the same account into one component.

For example:

["John", "[email protected]", "[email protected]", "[email protected]"]

creates unions:

a <-> b
a <-> c

After these operations, all three emails share the same root.

Step 3: Apply Path Compression During Find

The find() operation recursively locates the root parent of an email.

Path compression optimizes future lookups by directly attaching nodes to the root.

Without path compression, trees can become tall and inefficient.

Step 4: Group Emails by Root

After all unions are complete:

  1. Iterate through every email
  2. Find its root representative
  3. Add the email into a group corresponding to that root

Now each group contains all emails belonging to one merged account.

Step 5: Build Final Answer

For each connected component:

  1. Sort the emails lexicographically
  2. Retrieve the account name using any email in the group
  3. Construct the merged account

The format becomes:

[name, sorted_email_1, sorted_email_2, ...]

Why it works

Union-Find guarantees that all emails connected directly or indirectly belong to the same component. Since every account unions all of its emails together, any shared email causes the connected accounts to merge transitively.

Grouping emails by their final root therefore produces exactly the connected components representing each person.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def accountsMerge(self, accounts: List[List[str]]) -> List[List[str]]:
        parent = {}
        email_to_name = {}

        def find(email: str) -> str:
            if parent[email] != email:
                parent[email] = find(parent[email])
            return parent[email]

        def union(email1: str, email2: str) -> None:
            root1 = find(email1)
            root2 = find(email2)

            if root1 != root2:
                parent[root2] = root1

        # Initialize parent pointers and map emails to names
        for account in accounts:
            name = account[0]

            for email in account[1:]:
                if email not in parent:
                    parent[email] = email

                email_to_name[email] = name

        # Union emails within the same account
        for account in accounts:
            first_email = account[1]

            for email in account[2:]:
                union(first_email, email)

        # Group emails by root parent
        groups = defaultdict(list)

        for email in parent:
            root = find(email)
            groups[root].append(email)

        # Build final result
        result = []

        for root_email, emails in groups.items():
            name = email_to_name[root_email]
            merged_account = [name] + sorted(emails)
            result.append(merged_account)

        return result

The implementation begins by initializing two dictionaries. The parent dictionary stores the Union-Find structure, while email_to_name remembers which user name belongs to each email.

The find() function implements path compression. Whenever we search for a root, we flatten the structure so future searches become faster.

The union() function merges two connected email groups by attaching one root to another.

The first loop initializes every email as its own parent and records the associated account name.

The second loop unions all emails within the same account using the first email as the representative anchor.

After all unions are complete, we iterate through every email and group them by root parent using a defaultdict(list).

Finally, we sort each group's emails and prepend the account name to create the final merged account structure.

Go Solution

package main

import (
	"sort"
)

func accountsMerge(accounts [][]string) [][]string {
	parent := make(map[string]string)
	emailToName := make(map[string]string)

	var find func(string) string

	find = func(email string) string {
		if parent[email] != email {
			parent[email] = find(parent[email])
		}
		return parent[email]
	}

	union := func(email1, email2 string) {
		root1 := find(email1)
		root2 := find(email2)

		if root1 != root2 {
			parent[root2] = root1
		}
	}

	// Initialize parent pointers
	for _, account := range accounts {
		name := account[0]

		for i := 1; i < len(account); i++ {
			email := account[i]

			if _, exists := parent[email]; !exists {
				parent[email] = email
			}

			emailToName[email] = name
		}
	}

	// Union emails within the same account
	for _, account := range accounts {
		firstEmail := account[1]

		for i := 2; i < len(account); i++ {
			union(firstEmail, account[i])
		}
	}

	// Group emails by root
	groups := make(map[string][]string)

	for email := range parent {
		root := find(email)
		groups[root] = append(groups[root], email)
	}

	// Build result
	result := [][]string{}

	for root, emails := range groups {
		sort.Strings(emails)

		account := []string{emailToName[root]}
		account = append(account, emails...)

		result = append(result, account)
	}

	return result
}

The Go implementation follows the same logic as the Python version, but uses Go maps and slices.

Go does not support nested named functions in the same way as Python, so the find function is declared as a variable first and then assigned recursively.

Slices are dynamically appended during grouping and result construction. Since Go maps return zero values for missing keys, explicit existence checks are used during initialization.

Integer overflow is not a concern because the problem sizes are small and no arithmetic grows significantly.

Worked Examples

Example 1

Input:

[
  ["John","[email protected]","[email protected]"],
  ["John","[email protected]","[email protected]"],
  ["Mary","[email protected]"],
  ["John","[email protected]"]
]

Step 1: Initialize Parents

Email Parent
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]
[email protected] [email protected]

Step 2: Perform Unions

From first account:

union(johnsmith, john_newyork)

Parent structure:

Email Parent
[email protected] [email protected]

From second account:

union(johnsmith, john00)

Updated structure:

Email Parent
[email protected] [email protected]

Third and fourth accounts contain only one email, so no unions occur.

Step 3: Group by Root

Root Emails
[email protected] [email protected], [email protected], [email protected]
[email protected] [email protected]
[email protected] [email protected]

Step 4: Sort and Build Result

Output:

[
  ["John","[email protected]","[email protected]","[email protected]"],
  ["Mary","[email protected]"],
  ["John","[email protected]"]
]

Example 2

Input:

[
  ["Gabe","[email protected]","[email protected]","[email protected]"],
  ["Kevin","[email protected]","[email protected]","[email protected]"],
  ["Ethan","[email protected]","[email protected]","[email protected]"],
  ["Hanzo","[email protected]","[email protected]","[email protected]"],
  ["Fern","[email protected]","[email protected]","[email protected]"]
]

No accounts share emails, so every account remains independent.

Union Operations

Account Unions
Gabe Gabe0 ↔ Gabe3, Gabe0 ↔ Gabe1
Kevin Kevin3 ↔ Kevin5, Kevin3 ↔ Kevin0
Ethan Ethan5 ↔ Ethan4, Ethan5 ↔ Ethan0
Hanzo Hanzo3 ↔ Hanzo1, Hanzo3 ↔ Hanzo0
Fern Fern5 ↔ Fern1, Fern5 ↔ Fern0

Each connected component contains only emails from its own account.

Final output contains five separate merged accounts.

Complexity Analysis

Measure Complexity Explanation
Time O(E Ɨ α(E) + E log E) Union-Find operations are nearly constant, sorting dominates
Space O(E) Parent map, grouping map, and result storage

Here, E represents the total number of emails across all accounts.

Union and find operations are effectively constant time because of path compression. The main additional cost comes from sorting emails within each merged component. Across all groups, sorting contributes up to O(E log E) total time.

The space complexity is linear because we store parent pointers, mappings, grouped emails, and the final output.

Test Cases

from typing import List

def normalize(result: List[List[str]]) -> List[List[str]]:
    return sorted([ [x[0]] + sorted(x[1:]) for x in result ])

solution = Solution()

# Example 1
accounts = [
    ["John","[email protected]","[email protected]"],
    ["John","[email protected]","[email protected]"],
    ["Mary","[email protected]"],
    ["John","[email protected]"]
]

expected = [
    ["John","[email protected]","[email protected]","[email protected]"],
    ["Mary","[email protected]"],
    ["John","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # shared email merge

# Example 2
accounts = [
    ["Gabe","[email protected]","[email protected]","[email protected]"],
    ["Kevin","[email protected]","[email protected]","[email protected]"],
    ["Ethan","[email protected]","[email protected]","[email protected]"],
    ["Hanzo","[email protected]","[email protected]","[email protected]"],
    ["Fern","[email protected]","[email protected]","[email protected]"]
]

expected = [
    ["Ethan","[email protected]","[email protected]","[email protected]"],
    ["Gabe","[email protected]","[email protected]","[email protected]"],
    ["Hanzo","[email protected]","[email protected]","[email protected]"],
    ["Kevin","[email protected]","[email protected]","[email protected]"],
    ["Fern","[email protected]","[email protected]","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # no merges

# Single account
accounts = [
    ["Alice","[email protected]"]
]

expected = [
    ["Alice","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # minimal input

# Indirect transitive merge
accounts = [
    ["A","[email protected]","[email protected]"],
    ["A","[email protected]","[email protected]"],
    ["A","[email protected]","[email protected]"]
]

expected = [
    ["A","[email protected]","[email protected]","[email protected]","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # transitive connectivity

# Same name but different people
accounts = [
    ["John","[email protected]"],
    ["John","[email protected]"]
]

expected = [
    ["John","[email protected]"],
    ["John","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # same name without overlap

# Multiple disconnected groups
accounts = [
    ["A","[email protected]","[email protected]"],
    ["B","[email protected]","[email protected]"],
    ["A","[email protected]","[email protected]"]
]

expected = [
    ["A","[email protected]","[email protected]","[email protected]"],
    ["B","[email protected]","[email protected]"]
]

assert normalize(solution.accountsMerge(accounts)) == normalize(expected)  # multiple components
Test Why
Example 1 Validates basic merging with shared emails
Example 2 Validates independent accounts remain separate
Single account Tests smallest valid input
Indirect transitive merge Ensures transitive connectivity works correctly
Same name but different people Verifies names alone do not cause merges
Multiple disconnected groups Tests handling of multiple connected components

Edge Cases

Same Name, Different People

Two unrelated users may share the same account name. A naive solution might incorrectly merge them based only on the name.

For example:

["John", "[email protected]"]
["John", "[email protected]"]

These accounts must remain separate because they share no email addresses. The implementation correctly avoids this issue because connectivity is based entirely on emails, not names.

Transitive Connections

Accounts may connect indirectly through intermediate accounts.

For example:

A shares with B
B shares with C

Even if A and C do not directly overlap, they still belong to the same person.

Union-Find naturally handles this because all connected emails eventually share the same root representative.

Single Email Accounts

Some accounts contain only one email address. In these cases, no union operations occur.

A buggy implementation might accidentally skip these accounts during grouping because no merges happen.

This solution correctly initializes every email in the parent map before unions begin, ensuring isolated emails still appear in the final result.