LeetCode 839 - Similar String Groups

The problem asks us to determine the number of similarity groups within a list of strings. Two strings are defined as similar if they are either identical or if we can swap exactly two letters in one string to make it equal to the other.

LeetCode Problem 839

Difficulty: 🔴 Hard
Topics: Array, Hash Table, String, Depth-First Search, Breadth-First Search, Union-Find

Solution

Problem Understanding

The problem asks us to determine the number of similarity groups within a list of strings. Two strings are defined as similar if they are either identical or if we can swap exactly two letters in one string to make it equal to the other. For example, "tars" and "rats" are similar because swapping the letters at positions 0 and 2 in "tars" results in "rats". The key point is that similarity is transitive, meaning if "tars" is similar to "rats" and "rats" is similar to "arts", then "tars" and "arts" belong to the same similarity group even if they are not directly similar.

The input strs is a list of strings where all strings are anagrams of each other. This guarantees that each string has the same length and contains the same letters, just arranged differently. The output is an integer representing the number of similarity groups in the list.

Constraints provide guidance about the problem scale: strs can have up to 300 strings, each up to 300 characters. This suggests that a brute-force approach that checks every pair of strings might be too slow in the worst case, and we need an efficient method to determine connected components in a similarity graph.

Important edge cases include a single string, multiple identical strings, strings already forming one large group, or strings that do not connect at all beyond themselves.

Approaches

The brute-force approach constructs a graph where each string is a node, and edges connect similar strings. We check every pair of strings and use either Depth-First Search (DFS) or Breadth-First Search (BFS) to count connected components. This works because similarity is transitive and connected components correspond to groups. However, comparing every pair is $O(n^2 \cdot m)$, where $n$ is the number of strings and $m$ is the length of the strings, which is inefficient for large inputs.

The key observation for a more optimal approach is that since all strings are anagrams, similarity only requires comparing mismatched positions. For two strings, we only need to check positions where characters differ, and if there are exactly two differences, swapping will make them identical. This allows us to combine DFS/BFS with Union-Find to efficiently merge groups without constructing the full graph explicitly.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * m) O(n^2) Check all pairs for similarity and traverse graph
Optimal (Union-Find + Pairwise Check) O(n^2 * m) O(n) Efficiently merges groups using disjoint sets; avoids explicit graph storage

The optimal approach still requires pairwise comparison in the worst case but avoids additional overhead of adjacency structures and provides a clean way to count groups.

Algorithm Walkthrough

  1. Initialize a Union-Find (Disjoint Set) data structure for all strings, with each string initially in its own group.
  2. Iterate over all pairs of strings (i, j). For each pair, check if they are similar by counting positions where the characters differ. If the count is 0 or 2, the strings are similar.
  3. If two strings are similar, merge their groups in the Union-Find structure.
  4. After processing all pairs, the number of unique groups corresponds to the number of unique roots in the Union-Find data structure.
  5. Return the count of distinct roots as the final number of similarity groups.

Why it works: Similarity is transitive, so merging groups for directly similar strings guarantees that all indirectly similar strings end up in the same set. Union-Find efficiently tracks these sets and counts the number of distinct connected components.

Python Solution

from typing import List

class Solution:
    def numSimilarGroups(self, strs: List[str]) -> int:
        def are_similar(s1: str, s2: str) -> bool:
            diff = []
            for a, b in zip(s1, s2):
                if a != b:
                    diff.append((a, b))
                    if len(diff) > 2:
                        return False
            return len(diff) == 2 or len(diff) == 0

        parent = list(range(len(strs)))

        def find(x: int) -> int:
            if parent[x] != x:
                parent[x] = find(parent[x])
            return parent[x]

        def union(x: int, y: int):
            px, py = find(x), find(y)
            if px != py:
                parent[px] = py

        for i in range(len(strs)):
            for j in range(i + 1, len(strs)):
                if are_similar(strs[i], strs[j]):
                    union(i, j)

        return len(set(find(i) for i in range(len(strs))))

The Python solution defines a helper are_similar to check string similarity. We use a parent array for Union-Find, with find implementing path compression for efficiency. The double loop iterates through all pairs, merging similar strings. Finally, we count distinct roots to get the number of groups.

Go Solution

func numSimilarGroups(strs []string) int {
    n := len(strs)
    parent := make([]int, n)
    for i := 0; i < n; i++ {
        parent[i] = i
    }

    var find func(int) int
    find = func(x int) int {
        if parent[x] != x {
            parent[x] = find(parent[x])
        }
        return parent[x]
    }

    union := func(x, y int) {
        px, py := find(x), find(y)
        if px != py {
            parent[px] = py
        }
    }

    areSimilar := func(s1, s2 string) bool {
        diff := 0
        for i := 0; i < len(s1); i++ {
            if s1[i] != s2[i] {
                diff++
                if diff > 2 {
                    return false
                }
            }
        }
        return diff == 2 || diff == 0
    }

    for i := 0; i < n; i++ {
        for j := i + 1; j < n; j++ {
            if areSimilar(strs[i], strs[j]) {
                union(i, j)
            }
        }
    }

    groups := make(map[int]struct{})
    for i := 0; i < n; i++ {
        groups[find(i)] = struct{}{}
    }
    return len(groups)
}

In Go, we use slices and maps instead of Python lists and sets. The areSimilar function checks mismatch counts. union and find implement the Union-Find structure with path compression. We use a map to track unique roots for counting groups.

Worked Examples

Example 1

Input: ["tars","rats","arts","star"]

Step by step:

  1. Initialize parent = [0,1,2,3].
  2. Check pairs:
  • "tars" and "rats" → similar → union(0,1) → parent = [1,1,2,3]
  • "tars" and "arts" → not directly similar → skip
  • "rats" and "arts" → similar → union(1,2) → parent = [2,2,2,3]
  • "star" not similar to any previous → remains separate
  1. Unique roots: 2 → groups {"tars","rats","arts"}, {"star"}

Output: 2

Example 2

Input: ["omv","ovm"]

Step by step:

  1. Initialize parent = [0,1]
  2. "omv" and "ovm" → similar → union(0,1) → parent = [1,1]
  3. Unique roots: 1 → single group

Output: 1

Complexity Analysis

Measure Complexity Explanation
Time O(n^2 * m) We compare each pair of strings; each comparison takes O(m) time where m is the string length
Space O(n) Union-Find parent array stores one integer per string; additional constant space for diff array

Even though we iterate all pairs, path compression in Union-Find ensures efficient merging.

Test Cases

# provided examples
assert Solution().numSimilarGroups(["tars","rats","arts","star"]) == 2  # transitive similarity
assert Solution().numSimilarGroups(["omv","ovm"]) == 1  # direct similarity

# single string
assert Solution().numSimilarGroups(["abc"]) == 1  # single element group

# identical strings
assert Solution().numSimilarGroups(["abc","abc","abc"]) == 1  # all identical

# no similarity beyond itself
assert Solution().numSimilarGroups(["abc","def","ghi"]) == 3  # all isolated

# full connectivity
assert Solution().numSimilarGroups(["ab","ba"]) == 1  # two strings similar

# larger case
assert Solution().numSimilarGroups(["abc","acb","bac","bca","cab","cba"]) == 1  # all connected
Test Why
["tars","rats","arts","star"] Validates transitive similarity forming groups
["omv","ovm"] Validates direct