LeetCode 182 - Duplicate Emails

Difficulty: 🟢 Easy
Topics: Database

Solution

Problem Understanding

The problem gives us a database table named Person with two columns:

Column	Description
`id`	A unique integer identifier for each row
`email`	The email address associated with that row

The goal is to find all email addresses that appear more than once in the table. In other words, we need to identify duplicate email values.

The output should contain only the duplicated email addresses, with each duplicate email appearing exactly once in the result.

For example, if the table contains:

id	email
1	[email protected]
2	[email protected]
3	[email protected]

then [email protected] appears twice, so it should be included in the result.

The problem guarantees several important things:

id is unique, so each row is distinct.
email is never NULL, so we do not need special handling for missing values.
Emails contain only lowercase characters, meaning case sensitivity is not a concern.

This is fundamentally a grouping and counting problem. We need to count how many times each email appears, then return only the emails whose count is greater than 1.

Several edge cases are important to consider:

A table with no duplicate emails should return an empty result.
A table where the same email appears many times should still return that email only once.
A table with only one row cannot contain duplicates.
Multiple different emails may each have duplicates, and all of them should appear in the result.

Approaches

Brute Force Approach

A brute force solution would compare every email against every other email in the table.

For each row, we could scan the entire table again and count how many times the current email appears. If the count exceeds 1, we add it to the result set.

This works because every possible pair of rows is examined, so duplicates cannot be missed.

However, this approach is inefficient. If the table contains n rows, then for every row we scan all n rows again, leading to O(n²) time complexity. This becomes unnecessarily expensive for large datasets.

Optimal Approach

The key observation is that duplicate detection is naturally a counting problem.

SQL provides aggregation functions specifically designed for this kind of task. By grouping rows by email and counting how many rows belong to each group, we can directly identify duplicates.

The GROUP BY clause collects identical emails together, and COUNT(*) tells us how many times each email appears. Then the HAVING clause filters groups whose count is greater than 1.

This avoids repeated scanning and allows the database engine to efficiently compute the result.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n²)	O(1)	Compares every row against every other row
Optimal	O(n)	O(n)	Uses grouping and counting to detect duplicates efficiently

Algorithm Walkthrough

Optimal SQL Algorithm

Start with the Person table containing all rows.
Group all rows by the email column using GROUP BY email.

This creates one group for each unique email address. 3. For each group, compute the number of rows using COUNT(*).

This tells us how many times that email appears in the table. 4. Use the HAVING clause to keep only groups whose count is greater than 1.

These are the duplicated emails. 5. Return the email values from those filtered groups.

Why it works

The algorithm relies on the property that all identical email values are placed into the same group by GROUP BY. The count of each group exactly equals the number of occurrences of that email in the table. Therefore, any group with a count greater than 1 represents a duplicate email, and every duplicate email will be correctly identified.

Python Solution

SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;

The query begins by selecting the email column from the Person table.

The GROUP BY email clause combines all rows with the same email into a single group. Instead of processing rows individually, the database now works with groups of identical emails.

Next, COUNT(*) calculates how many rows exist in each group.

The HAVING COUNT(*) > 1 condition filters the grouped results. Only groups containing more than one row are kept, which means only duplicated emails remain.

Finally, email AS Email renames the output column to match the required format in the problem statement.

Go Solution

SELECT email AS Email
FROM Person
GROUP BY email
HAVING COUNT(*) > 1;

Since this is a SQL database problem, the solution is identical regardless of programming language. LeetCode expects a SQL query rather than executable Go code.

The logic remains the same:

GROUP BY creates one group per unique email.
COUNT(*) measures occurrences.
HAVING COUNT(*) > 1 filters only duplicates.

Unlike general Go implementations, there are no concerns here about slices, maps, nil handling, or integer overflow because the database engine handles aggregation internally.

Worked Examples

Example 1

Input table:

id	email
1	[email protected]
2	[email protected]
3	[email protected]

Step 1: Group by email

email	Rows in Group
[email protected]	rows 1 and 3
[email protected]	row 2

Step 2: Count each group

email	count
[email protected]	2
[email protected]	1

Step 3: Apply HAVING COUNT(*) > 1

email	count	Keep?
[email protected]	2	Yes
[email protected]	1	No

Final Output

Email
[email protected]

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n)	Each row is processed once during grouping and counting
Space	O(n)	The database may store grouping information for unique emails

The database engine scans the table and organizes rows into groups based on email. Each row contributes once to its corresponding group, making the operation effectively linear in the number of rows.

The space complexity depends on how many distinct email groups must be maintained internally during aggregation. In the worst case, every email could be unique, requiring storage proportional to the number of rows.

Test Cases

# Example case with one duplicate
person_table = [
    (1, "[email protected]"),
    (2, "[email protected]"),
    (3, "[email protected]")
]
assert {"[email protected]"} == {"[email protected]"}  # basic duplicate detection

# No duplicates
person_table = [
    (1, "[email protected]"),
    (2, "[email protected]"),
    (3, "[email protected]")
]
assert set() == set()  # should return empty result

# Multiple duplicate emails
person_table = [
    (1, "[email protected]"),
    (2, "[email protected]"),
    (3, "[email protected]"),
    (4, "[email protected]")
]
assert {"[email protected]", "[email protected]"} == {"[email protected]", "[email protected]"}  # multiple duplicates

# Same email repeated many times
person_table = [
    (1, "[email protected]"),
    (2, "[email protected]"),
    (3, "[email protected]"),
    (4, "[email protected]")
]
assert {"[email protected]"} == {"[email protected]"}  # repeated many times

# Single row table
person_table = [
    (1, "[email protected]")
]
assert set() == set()  # single row cannot have duplicates

# Mixed duplicates and unique emails
person_table = [
    (1, "[email protected]"),
    (2, "[email protected]"),
    (3, "[email protected]"),
    (4, "[email protected]"),
    (5, "[email protected]"),
    (6, "[email protected]")
]
assert {"[email protected]", "[email protected]"} == {"[email protected]", "[email protected]"}  # mixed case

Test	Why
One duplicate email	Validates the core functionality
No duplicates	Ensures empty output is handled correctly
Multiple duplicated emails	Confirms multiple results are returned
Same email repeated many times	Verifies repeated duplicates appear only once
Single row table	Tests minimum input size
Mixed duplicates and unique emails	Ensures filtering logic is correct

Edge Cases

No Duplicate Emails

A table may contain entirely unique email addresses. A naive implementation might accidentally return all emails or fail to handle an empty result properly.

This solution handles the case naturally because HAVING COUNT(*) > 1 removes every group whose count is exactly 1. The final result becomes empty.

One Email Repeated Many Times

An email may appear far more than twice. The problem only wants the email listed once in the output, not repeated for every duplicate occurrence.

Because GROUP BY creates exactly one group per unique email, the result automatically contains only one row for each duplicated email regardless of how many times it appears.

Single Row Input

A table with only one row cannot possibly contain duplicates. Some incorrect implementations may assume at least two rows exist or mishandle small datasets.

The grouping logic still works correctly. The single email forms one group with count 1, which fails the HAVING COUNT(*) > 1 condition, producing an empty result.

Multiple Independent Duplicate Groups

Several different emails may each appear multiple times. A flawed implementation might stop after finding the first duplicate.

This solution processes every group independently, so all duplicated emails are returned correctly.