LeetCode 49 - Group Anagrams

The problem asks us to group together all strings that are anagrams of each other. Two strings are considered anagrams if they contain exactly the same characters with the same frequencies, but possibly in a different order.

LeetCode Problem 49

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Sorting

Solution

Problem Understanding

The problem asks us to group together all strings that are anagrams of each other.

Two strings are considered anagrams if they contain exactly the same characters with the same frequencies, but possibly in a different order. For example, "eat", "tea", and "ate" are all anagrams because each string contains one 'a', one 'e', and one 't'.

The input is an array of strings called strs. Each string consists only of lowercase English letters. Our task is to return a list of groups, where each group contains strings that are anagrams of one another.

The order of the groups does not matter, and the order of strings inside each group also does not matter. That means multiple valid outputs are acceptable as long as the grouping is correct.

The constraints are important:

  • There can be up to 10^4 strings
  • Each string can have length up to 100

These limits tell us that an inefficient pairwise comparison approach will likely be too slow. We need a solution that can efficiently identify whether two strings belong to the same anagram group.

A few important edge cases stand out immediately:

  • Empty strings, such as [""]
  • Single character strings
  • Multiple identical strings
  • Strings with the same length but different character frequencies
  • Large inputs with many repeated patterns

The problem guarantees that all strings contain only lowercase English letters, which allows us to use fixed-size character frequency arrays if desired.

Approaches

Brute Force Approach

A straightforward approach is to compare every string against every existing group.

For each string, we could check whether it is an anagram of the representative string from a group. To determine whether two strings are anagrams, we could sort both strings and compare the results, or count character frequencies.

The algorithm would work like this:

  1. Maintain a list of groups.
  2. For each string:
  • Compare it with one representative string from every existing group.
  • If it is an anagram, place it into that group.
  • Otherwise, create a new group.

This approach is correct because every string is explicitly checked against previously formed groups. However, it becomes inefficient as the number of strings grows.

If there are n strings and each comparison requires sorting strings of length k, the total complexity becomes very large, especially in the worst case where many groups exist.

Optimal Approach

The key insight is that all anagrams share the same canonical representation.

If we sort the characters of a string, every anagram produces the same sorted result:

  • "eat""aet"
  • "tea""aet"
  • "ate""aet"

This sorted string can act as a unique key identifying the anagram group.

We can use a hash map where:

  • The key is the sorted version of the string
  • The value is the list of strings belonging to that anagram group

As we iterate through the input:

  1. Sort the current string
  2. Use the sorted string as the hash map key
  3. Append the original string to that group

This dramatically improves efficiency because hash map lookup is approximately O(1).

Approach Time Complexity Space Complexity Notes
Brute Force O(n² × k log k) O(n × k) Compares strings against existing groups repeatedly
Optimal O(n × k log k) O(n × k) Uses sorted strings as hash map keys

Here:

  • n is the number of strings
  • k is the maximum string length

Algorithm Walkthrough

  1. Create a hash map where:
  • The key represents the canonical form of an anagram group
  • The value is a list of strings belonging to that group
  1. Iterate through every string in the input array.
  2. For the current string, sort its characters alphabetically.
  • This creates a normalized representation.

  • All anagrams produce the same normalized form.

  • For example:

  • "tan" becomes "ant"

  • "nat" also becomes "ant"

  1. Use the sorted string as the key in the hash map.
  • If the key does not exist yet, create a new empty list.
  • Append the original string to the list associated with that key.
  1. Continue until every string has been processed.
  2. Return all values from the hash map.
  • Each value is one anagram group.

The reason a hash map works well here is that it provides fast grouping behavior. Once a canonical representation is computed, we can immediately locate the correct group in near constant time.

Why it works

The algorithm works because sorting transforms every anagram into the same canonical representation.

If two strings are anagrams, they contain exactly the same characters with the same frequencies. Sorting those characters produces identical sorted strings. Therefore, all anagrams map to the same hash map key and end up in the same group.

Conversely, non-anagrams produce different sorted strings, so they are placed into different groups.

Python Solution

from collections import defaultdict
from typing import List

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        anagram_groups = defaultdict(list)

        for word in strs:
            sorted_word = "".join(sorted(word))
            anagram_groups[sorted_word].append(word)

        return list(anagram_groups.values())

The implementation begins by creating a defaultdict(list). This automatically initializes an empty list whenever a new key is encountered, which simplifies the logic.

For every word in the input array, the code sorts the characters using sorted(word). Since sorted() returns a list of characters, "".join() converts it back into a string.

This sorted string becomes the canonical identifier for the anagram group.

The original word is then appended to the list associated with that key in the hash map.

Finally, the function returns all grouped values as a list of lists.

The implementation directly follows the algorithm described earlier:

  • Normalize each string
  • Use normalization as the grouping key
  • Store related strings together

Go Solution

package main

import (
	"sort"
)

func groupAnagrams(strs []string) [][]string {
	anagramGroups := make(map[string][]string)

	for _, word := range strs {
		chars := []rune(word)

		sort.Slice(chars, func(i, j int) bool {
			return chars[i] < chars[j]
		})

		sortedWord := string(chars)

		anagramGroups[sortedWord] = append(anagramGroups[sortedWord], word)
	}

	result := make([][]string, 0, len(anagramGroups))

	for _, group := range anagramGroups {
		result = append(result, group)
	}

	return result
}

The Go implementation follows the same overall strategy as the Python version.

One notable difference is that Go strings are immutable, so we convert the string into a slice of runes before sorting.

The sort.Slice function is used to sort the rune slice in ascending order.

Go maps do not automatically initialize slices like Python's defaultdict, but append works correctly even on a nil slice, so we can directly append without checking existence first.

The final result is constructed by iterating through the map values and appending them into a result slice.

Worked Examples

Example 1

Input:

["eat","tea","tan","ate","nat","bat"]

We process each word one at a time.

Current Word Sorted Key Hash Map State
"eat" "aet" { "aet": ["eat"] }
"tea" "aet" { "aet": ["eat", "tea"] }
"tan" "ant" { "aet": ["eat", "tea"], "ant": ["tan"] }
"ate" "aet" { "aet": ["eat", "tea", "ate"], "ant": ["tan"] }
"nat" "ant" { "aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"] }
"bat" "abt" { "aet": ["eat", "tea", "ate"], "ant": ["tan", "nat"], "abt": ["bat"] }

Final result:

[
  ["eat","tea","ate"],
  ["tan","nat"],
  ["bat"]
]

Example 2

Input:

[""]

Processing steps:

Current Word Sorted Key Hash Map State
"" "" { "": [""] }

Final result:

[[""]]

The empty string sorts to itself, so it correctly forms its own group.

Example 3

Input:

["a"]

Processing steps:

Current Word Sorted Key Hash Map State
"a" "a" { "a": ["a"] }

Final result:

[["a"]]

Complexity Analysis

Measure Complexity Explanation
Time O(n × k log k) Each string is sorted individually
Space O(n × k) Hash map stores all strings and keys

The dominant operation is sorting each string.

If:

  • n is the number of strings
  • k is the maximum length of a string

Then sorting one string costs O(k log k), and we do this for all n strings.

The space complexity comes from storing:

  • The hash map keys
  • The grouped strings
  • Intermediate sorted strings

In total, this scales proportionally to the total input size.

Test Cases

def normalize(result):
    return sorted([sorted(group) for group in result])

solution = Solution()

# Basic example from the problem statement
assert normalize(
    solution.groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"])
) == normalize([["bat"], ["nat", "tan"], ["ate", "eat", "tea"]])

# Single empty string
assert normalize(
    solution.groupAnagrams([""])
) == normalize([[""]])

# Single character
assert normalize(
    solution.groupAnagrams(["a"])
) == normalize([["a"]])

# Multiple identical strings
assert normalize(
    solution.groupAnagrams(["abc", "abc", "abc"])
) == normalize([["abc", "abc", "abc"]])

# No anagrams at all
assert normalize(
    solution.groupAnagrams(["abc", "def", "ghi"])
) == normalize([["abc"], ["def"], ["ghi"]])

# Multiple groups with duplicates
assert normalize(
    solution.groupAnagrams(["ab", "ba", "abc", "cab", "bac", "xy"])
) == normalize([["ab", "ba"], ["abc", "cab", "bac"], ["xy"]])

# Strings with repeated letters
assert normalize(
    solution.groupAnagrams(["aab", "aba", "baa", "abb"])
) == normalize([["aab", "aba", "baa"], ["abb"]])

# Large number of empty strings
assert normalize(
    solution.groupAnagrams(["", "", ""])
) == normalize([["", "", ""]])

# Mixed short and long strings
assert normalize(
    solution.groupAnagrams(["listen", "silent", "enlist", "google", "gooegl"])
) == normalize([["listen", "silent", "enlist"], ["google", "gooegl"]])
Test Why
["eat","tea","tan","ate","nat","bat"] Standard mixed grouping scenario
[""] Validates handling of empty strings
["a"] Smallest non-empty input
["abc","abc","abc"] Ensures duplicates remain grouped
["abc","def","ghi"] Ensures unrelated strings stay separate
["ab","ba","abc","cab","bac","xy"] Tests multiple independent groups
["aab","aba","baa","abb"] Verifies repeated character handling
["","",""] Multiple empty strings grouped together
["listen","silent","enlist","google","gooegl"] Tests larger realistic anagram groups

Edge Cases

One important edge case is the empty string. Sorting an empty string still produces an empty string, so all empty strings naturally map to the same key. The implementation handles this without requiring any special conditional logic.

Another important case involves repeated characters, such as "aab" and "aba". A naive implementation that only checks unique character presence could incorrectly group strings together. Sorting preserves both character identity and frequency, ensuring these strings correctly map to the same canonical representation.

A third edge case occurs when there are no anagrams at all. For example, ["abc", "def", "ghi"] should produce three separate groups. The hash map naturally handles this because each sorted key is distinct, causing every string to create its own group.

A final subtle edge case is duplicate strings. If the input contains multiple copies of the same string, all copies should appear in the output. Since the algorithm appends every occurrence independently, duplicates are preserved correctly rather than overwritten or removed.