LeetCode 2157 - Groups of Strings
The problem asks us to group a list of unique lowercase strings based on a special notion of connectivity. Each string can be represented as a set of letters, and two strings are connected if one can be transformed into the other with exactly one operation: adding a letter…
Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Bit Manipulation, Union-Find
Solution
Problem Understanding
The problem asks us to group a list of unique lowercase strings based on a special notion of connectivity. Each string can be represented as a set of letters, and two strings are connected if one can be transformed into the other with exactly one operation: adding a letter, removing a letter, or replacing a letter (including replacing it with itself).
The input is a list of strings words where each string contains distinct letters, and the output should be an array [number_of_groups, largest_group_size]. The number of groups represents how many distinct connected clusters can be formed, and the largest group size is the maximum number of strings in any single group.
Key constraints include the length of the array (1 <= words.length <= 2 * 10^4) and the fact that each string has at most 26 letters. These constraints suggest that a naive pairwise comparison approach would be too slow (O(n^2)), and an optimized representation using bit manipulation or union-find is necessary.
Important edge cases are single-character strings, strings that are completely disjoint (like "a" and "xyz"), and strings that are subsets of others ("ab" and "abc"). The problem guarantees that a unique grouping exists, which ensures no ambiguous overlaps.
Approaches
The brute-force approach would compare every pair of strings and check if they are connected based on the three operations. This involves converting strings to sets and checking for differences. While correct, this has a time complexity of O(n^2 * m) where m is the average string length, which is too slow for n = 2 * 10^4.
The optimal approach leverages bit manipulation and union-find. Each string can be represented as a 26-bit integer, where each bit indicates the presence of a letter. Connectivity operations then reduce to bit-level operations:
- Add a letter: flip one 0-bit to 1.
- Remove a letter: flip one 1-bit to 0.
- Replace a letter: remove one 1-bit and add one 0-bit (can also include replacing with itself).
We use a union-find (disjoint-set) structure to merge strings that are connected efficiently. We iterate through all masks, generate all masks reachable by one operation, and union them. This drastically reduces the number of checks and leverages fast bitwise operations.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2 * m) | O(n) | Compare each pair using set operations; too slow for large n |
| Optimal | O(n * 26^2) | O(n + 2^26) | Use bit masks to represent strings and union-find for connectivity; fast for the problem constraints |
Algorithm Walkthrough
- Convert strings to bit masks: Each string is converted into a 26-bit integer where
1 << (ord(c) - ord('a'))marks the presence of characterc. - Initialize union-find: Create a union-find structure for all unique masks.
- Union connected masks: For each mask, attempt all single-letter operations:
a. Remove a letter: flip each bit that is 1, and union with the resulting mask if it exists.
b. Add a letter: flip each bit that is 0, and union with the resulting mask if it exists.
c. Replace a letter: combine remove and add in a nested loop, unioning with all resulting masks that exist. 4. Count groups and max group size: Traverse all masks, find their root in union-find, and count the number of unique roots for groups. Track the size of each root's set to find the largest group.
Why it works: The bitmask representation ensures each string is encoded uniquely. Union-find merges all connected strings under the same root, capturing the transitive connectivity defined by the problem. Counting roots then directly gives the number of groups, and the largest set size gives the largest group.
Python Solution
from typing import List
from collections import defaultdict
class UnionFind:
def __init__(self):
self.parent = {}
self.size = {}
def find(self, x):
if x not in self.parent:
self.parent[x] = x
self.size[x] = 1
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
px, py = self.find(x), self.find(y)
if px == py:
return
if self.size[px] < self.size[py]:
px, py = py, px
self.parent[py] = px
self.size[px] += self.size[py]
class Solution:
def groupStrings(self, words: List[str]) -> List[int]:
def word_to_mask(word):
mask = 0
for c in word:
mask |= 1 << (ord(c) - ord('a'))
return mask
masks = [word_to_mask(word) for word in words]
uf = UnionFind()
mask_set = set(masks)
for mask in masks:
uf.find(mask)
for mask in masks:
# Remove one letter
for i in range(26):
if mask & (1 << i):
uf.union(mask, mask ^ (1 << i))
# Add one letter
for i in range(26):
if not mask & (1 << i):
new_mask = mask | (1 << i)
if new_mask in mask_set:
uf.union(mask, new_mask)
# Replace one letter
for i in range(26):
if mask & (1 << i):
for j in range(26):
if not mask & (1 << j):
new_mask = mask ^ (1 << i) | (1 << j)
if new_mask in mask_set:
uf.union(mask, new_mask)
group_count = defaultdict(int)
for mask in masks:
root = uf.find(mask)
group_count[root] += 1
return [len(group_count), max(group_count.values())]
Explanation: We first convert each string to a bitmask and initialize union-find for each mask. For each mask, we simulate all possible one-letter operations, unioning masks that exist in the set. Finally, we count groups by unique roots and determine the largest group size.
Go Solution
package main
func groupStrings(words []string) []int {
type UnionFind struct {
parent map[int]int
size map[int]int
}
uf := &UnionFind{
parent: make(map[int]int),
size: make(map[int]int),
}
var find func(x int) int
find = func(x int) int {
if _, ok := uf.parent[x]; !ok {
uf.parent[x] = x
uf.size[x] = 1
}
if uf.parent[x] != x {
uf.parent[x] = find(uf.parent[x])
}
return uf.parent[x]
}
union := func(x, y int) {
px, py := find(x), find(y)
if px == py {
return
}
if uf.size[px] < uf.size[py] {
px, py = py, px
}
uf.parent[py] = px
uf.size[px] += uf.size[py]
}
wordToMask := func(word string) int {
mask := 0
for _, c := range word {
mask |= 1 << (c - 'a')
}
return mask
}
masks := make([]int, len(words))
maskSet := make(map[int]bool)
for i, word := range words {
masks[i] = wordToMask(word)
maskSet[masks[i]] = true
find(masks[i])
}
for _, mask := range masks {
// Remove one letter
for i := 0; i < 26; i++ {
if mask&(1<<i) != 0 {
union(mask, mask^(1<<i))
}
}
// Add one letter
for i := 0; i < 26; i++ {
if mask&(1<<i) == 0 {
newMask := mask | (1 << i)
if maskSet[newMask] {
union(mask, newMask)
}
}
}
// Replace one letter
for i := 0; i < 26; i++ {
if mask&(1<<i) != 0 {
for j := 0; j < 26; j++ {
if mask&(1<<j) == 0 {
newMask := mask^(1<<i) | (1 << j)
if maskSet[newMask] {
union(mask, newMask)
}
}
}
}
}
}
groupCount := make(map[int]int)
maxSize := 0
for _, mask := range masks {
root := find(mask)
groupCount[root]++
if groupCount[root] > maxSize {
maxSize = groupCount[root]
}
}
return []int{len(groupCount), maxSize}
}
Explanation: The Go implementation mirrors the Python approach. `map[int