LeetCode 2157 - Groups of Strings

The problem asks us to group a list of unique lowercase strings based on a special notion of connectivity. Each string can be represented as a set of letters, and two strings are connected if one can be transformed into the other with exactly one operation: adding a letter…

LeetCode Problem 2157

Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Bit Manipulation, Union-Find

Solution

Problem Understanding

The problem asks us to group a list of unique lowercase strings based on a special notion of connectivity. Each string can be represented as a set of letters, and two strings are connected if one can be transformed into the other with exactly one operation: adding a letter, removing a letter, or replacing a letter (including replacing it with itself).

The input is a list of strings words where each string contains distinct letters, and the output should be an array [number_of_groups, largest_group_size]. The number of groups represents how many distinct connected clusters can be formed, and the largest group size is the maximum number of strings in any single group.

Key constraints include the length of the array (1 <= words.length <= 2 * 10^4) and the fact that each string has at most 26 letters. These constraints suggest that a naive pairwise comparison approach would be too slow (O(n^2)), and an optimized representation using bit manipulation or union-find is necessary.

Important edge cases are single-character strings, strings that are completely disjoint (like "a" and "xyz"), and strings that are subsets of others ("ab" and "abc"). The problem guarantees that a unique grouping exists, which ensures no ambiguous overlaps.

Approaches

The brute-force approach would compare every pair of strings and check if they are connected based on the three operations. This involves converting strings to sets and checking for differences. While correct, this has a time complexity of O(n^2 * m) where m is the average string length, which is too slow for n = 2 * 10^4.

The optimal approach leverages bit manipulation and union-find. Each string can be represented as a 26-bit integer, where each bit indicates the presence of a letter. Connectivity operations then reduce to bit-level operations:

  • Add a letter: flip one 0-bit to 1.
  • Remove a letter: flip one 1-bit to 0.
  • Replace a letter: remove one 1-bit and add one 0-bit (can also include replacing with itself).

We use a union-find (disjoint-set) structure to merge strings that are connected efficiently. We iterate through all masks, generate all masks reachable by one operation, and union them. This drastically reduces the number of checks and leverages fast bitwise operations.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * m) O(n) Compare each pair using set operations; too slow for large n
Optimal O(n * 26^2) O(n + 2^26) Use bit masks to represent strings and union-find for connectivity; fast for the problem constraints

Algorithm Walkthrough

  1. Convert strings to bit masks: Each string is converted into a 26-bit integer where 1 << (ord(c) - ord('a')) marks the presence of character c.
  2. Initialize union-find: Create a union-find structure for all unique masks.
  3. Union connected masks: For each mask, attempt all single-letter operations:

a. Remove a letter: flip each bit that is 1, and union with the resulting mask if it exists.

b. Add a letter: flip each bit that is 0, and union with the resulting mask if it exists.

c. Replace a letter: combine remove and add in a nested loop, unioning with all resulting masks that exist. 4. Count groups and max group size: Traverse all masks, find their root in union-find, and count the number of unique roots for groups. Track the size of each root's set to find the largest group.

Why it works: The bitmask representation ensures each string is encoded uniquely. Union-find merges all connected strings under the same root, capturing the transitive connectivity defined by the problem. Counting roots then directly gives the number of groups, and the largest set size gives the largest group.

Python Solution

from typing import List
from collections import defaultdict

class UnionFind:
    def __init__(self):
        self.parent = {}
        self.size = {}

    def find(self, x):
        if x not in self.parent:
            self.parent[x] = x
            self.size[x] = 1
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]

    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return
        if self.size[px] < self.size[py]:
            px, py = py, px
        self.parent[py] = px
        self.size[px] += self.size[py]

class Solution:
    def groupStrings(self, words: List[str]) -> List[int]:
        def word_to_mask(word):
            mask = 0
            for c in word:
                mask |= 1 << (ord(c) - ord('a'))
            return mask

        masks = [word_to_mask(word) for word in words]
        uf = UnionFind()
        mask_set = set(masks)

        for mask in masks:
            uf.find(mask)

        for mask in masks:
            # Remove one letter
            for i in range(26):
                if mask & (1 << i):
                    uf.union(mask, mask ^ (1 << i))
            # Add one letter
            for i in range(26):
                if not mask & (1 << i):
                    new_mask = mask | (1 << i)
                    if new_mask in mask_set:
                        uf.union(mask, new_mask)
            # Replace one letter
            for i in range(26):
                if mask & (1 << i):
                    for j in range(26):
                        if not mask & (1 << j):
                            new_mask = mask ^ (1 << i) | (1 << j)
                            if new_mask in mask_set:
                                uf.union(mask, new_mask)

        group_count = defaultdict(int)
        for mask in masks:
            root = uf.find(mask)
            group_count[root] += 1

        return [len(group_count), max(group_count.values())]

Explanation: We first convert each string to a bitmask and initialize union-find for each mask. For each mask, we simulate all possible one-letter operations, unioning masks that exist in the set. Finally, we count groups by unique roots and determine the largest group size.

Go Solution

package main

func groupStrings(words []string) []int {
    type UnionFind struct {
        parent map[int]int
        size   map[int]int
    }

    uf := &UnionFind{
        parent: make(map[int]int),
        size:   make(map[int]int),
    }

    var find func(x int) int
    find = func(x int) int {
        if _, ok := uf.parent[x]; !ok {
            uf.parent[x] = x
            uf.size[x] = 1
        }
        if uf.parent[x] != x {
            uf.parent[x] = find(uf.parent[x])
        }
        return uf.parent[x]
    }

    union := func(x, y int) {
        px, py := find(x), find(y)
        if px == py {
            return
        }
        if uf.size[px] < uf.size[py] {
            px, py = py, px
        }
        uf.parent[py] = px
        uf.size[px] += uf.size[py]
    }

    wordToMask := func(word string) int {
        mask := 0
        for _, c := range word {
            mask |= 1 << (c - 'a')
        }
        return mask
    }

    masks := make([]int, len(words))
    maskSet := make(map[int]bool)
    for i, word := range words {
        masks[i] = wordToMask(word)
        maskSet[masks[i]] = true
        find(masks[i])
    }

    for _, mask := range masks {
        // Remove one letter
        for i := 0; i < 26; i++ {
            if mask&(1<<i) != 0 {
                union(mask, mask^(1<<i))
            }
        }
        // Add one letter
        for i := 0; i < 26; i++ {
            if mask&(1<<i) == 0 {
                newMask := mask | (1 << i)
                if maskSet[newMask] {
                    union(mask, newMask)
                }
            }
        }
        // Replace one letter
        for i := 0; i < 26; i++ {
            if mask&(1<<i) != 0 {
                for j := 0; j < 26; j++ {
                    if mask&(1<<j) == 0 {
                        newMask := mask^(1<<i) | (1 << j)
                        if maskSet[newMask] {
                            union(mask, newMask)
                        }
                    }
                }
            }
        }
    }

    groupCount := make(map[int]int)
    maxSize := 0
    for _, mask := range masks {
        root := find(mask)
        groupCount[root]++
        if groupCount[root] > maxSize {
            maxSize = groupCount[root]
        }
    }

    return []int{len(groupCount), maxSize}
}

Explanation: The Go implementation mirrors the Python approach. `map[int