LeetCode 2506 - Count Pairs Of Similar Strings

Difficulty: 🟢 Easy
Topics: Array, Hash Table, String, Bit Manipulation, Counting

Solution

Problem Understanding

The problem gives us an array of strings called words. We must count how many pairs of indices (i, j) satisfy two conditions:

i < j
words[i] and words[j] are similar

Two strings are considered similar if they contain exactly the same set of distinct characters, regardless of order or frequency.

For example:

"aba" and "aabb" are similar because both contain only 'a' and 'b'
"abc" and "cba" are similar because both contain 'a', 'b', and 'c'
"abc" and "abd" are not similar because one contains 'c' while the other contains 'd'

The key detail is that character frequency does not matter. Only the presence or absence of characters matters.

The input size is relatively small:

Up to 100 words
Each word length up to 100
Only lowercase English letters

These constraints tell us that even a quadratic solution is acceptable, since 100 × 100 = 10,000 comparisons is small. However, we can still design a cleaner and more efficient solution using bit manipulation and hashing.

An important observation is that each word can be represented by the set of characters it contains. Since there are only 26 lowercase letters, we can encode this set as a 26-bit integer mask.

Several edge cases are important:

Multiple identical words
Words with repeated characters such as "aaaa"
Single-character words
No similar pairs at all
Every word being similar to every other word

The problem guarantees all strings contain only lowercase English letters, which makes bitmask encoding straightforward and efficient.

Approaches

Brute Force Approach

The most direct solution is to compare every pair of words.

For each pair (i, j):

Convert both words into sets of distinct characters
Compare the two sets
If they are equal, increment the answer

This works because two strings are similar exactly when their distinct character sets match.

For example:

"aabb" becomes {'a', 'b'}
"ba" becomes {'a', 'b'}

Since the sets are equal, the pair is counted.

Although this solution is simple and correct, it repeatedly rebuilds character sets for many comparisons. With n words and word length m, the complexity becomes inefficient compared to a hash-based grouping solution.

Optimal Approach

The key insight is that similarity depends only on the set of distinct characters.

Instead of comparing every pair directly, we can:

Convert each word into a compact representation
Group identical representations together
Count how many pairs can be formed inside each group

A bitmask is the perfect representation here.

We use a 26-bit integer where:

Bit 0 represents 'a'
Bit 1 represents 'b'
...
Bit 25 represents 'z'

For each character in a word, we set the corresponding bit.

Example:

"aba" produces mask:
'a' → bit 0
'b' → bit 1
mask = 000...011
"aabb" produces the exact same mask

Once we compute the mask for each word, we store frequencies in a hash map.

If a mask has appeared k times already, the current word forms k new similar pairs.

This avoids unnecessary pairwise comparisons and gives a clean linear-style solution.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(n² × m)	O(m)	Compare every pair by building character sets
Optimal	O(n × m)	O(n)	Use bitmasks and hash map grouping

Where:

n = number of words
m = average word length

Algorithm Walkthrough

Optimal Algorithm

Create an empty hash map called mask_count.

The key will be a bitmask representing a character set, and the value will be how many previous words produced that mask. 2. Initialize pairs = 0.

This variable stores the total number of similar pairs. 3. Iterate through every word in words.

For each word, compute its bitmask representation. 4. Build the bitmask.

Start with mask = 0.

For every character c in the word:

Compute its position:

ord(c) - ord('a')

Set the corresponding bit:

mask |= (1 << position)

Repeated characters do not matter because setting an already-set bit changes nothing. 5. Count previously seen matching masks.

If this mask already exists in mask_count, then every previous occurrence forms a valid pair with the current word.

Add:

mask_count[mask]

to pairs. 6. Update the frequency map.

Increment:

mask_count[mask]

After processing all words, return pairs.

Why it works

Two words are similar if and only if they contain the same distinct characters.

The bitmask uniquely represents the set of characters in a word. Therefore:

Similar words always produce identical masks
Non-similar words always produce different masks

By grouping words with the same mask, we count exactly the valid similar pairs.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def similarPairs(self, words: List[str]) -> int:
        mask_count = defaultdict(int)
        pairs = 0

        for word in words:
            mask = 0

            for char in word:
                bit_position = ord(char) - ord('a')
                mask |= (1 << bit_position)

            pairs += mask_count[mask]
            mask_count[mask] += 1

        return pairs

The solution uses a hash map to group words by their character-set bitmask.

The variable mask starts at 0 for each word. Every character sets one bit in the integer. Since repeated letters set the same bit again, duplicates naturally disappear from the representation.

The dictionary mask_count tracks how many times each mask has appeared previously. Before inserting the current mask, we add the existing frequency to pairs, because each earlier word with the same mask forms a valid pair with the current word.

This allows the solution to count pairs incrementally in a single pass through the array.

Go Solution

func similarPairs(words []string) int {
	maskCount := make(map[int]int)
	pairs := 0

	for _, word := range words {
		mask := 0

		for _, ch := range word {
			bitPosition := int(ch - 'a')
			mask |= (1 << bitPosition)
		}

		pairs += maskCount[mask]
		maskCount[mask]++
	}

	return pairs
}

The Go implementation follows the same logic as the Python version.

A map[int]int is used instead of Python's dictionary. Since the mask fits comfortably within a standard integer, no special overflow handling is needed.

Go iterates through string characters using range, which returns Unicode runes. Since the input guarantees lowercase English letters, subtracting 'a' safely computes the bit position.

Worked Examples

Example 1

words = ["aba","aabb","abcd","bac","aabc"]

Step-by-step trace

Word	Distinct Characters	Bitmask	Existing Count	New Pairs	Total Pairs
"aba"	{a,b}	3	0	0	0
"aabb"	{a,b}	3	1	1	1
"abcd"	{a,b,c,d}	15	0	0	1
"bac"	{a,b,c}	7	0	0	1
"aabc"	{a,b,c}	7	1	1	2

Final answer:

Example 2

words = ["aabb","ab","ba"]

Word	Distinct Characters	Bitmask	Existing Count	New Pairs	Total Pairs
"aabb"	{a,b}	3	0	0	0
"ab"	{a,b}	3	1	1	1
"ba"	{a,b}	3	2	2	3

Final answer:

Example 3

words = ["nba","cba","dba"]

Word	Distinct Characters	Bitmask
"nba"	{n,b,a}	unique
"cba"	{c,b,a}	unique
"dba"	{d,b,a}	unique

Final answer:

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n × m)	Each character of each word is processed once
Space	O(n)	Hash map may store one mask per word

Here, n is the number of words and m is the maximum word length.

The algorithm processes each word exactly once, and each character contributes only a constant-time bit operation. Hash map operations are also average-case constant time.

The space complexity comes from storing masks in the frequency map. In the worst case, every word has a unique character set.

Test Cases

sol = Solution()

# Provided examples
assert sol.similarPairs(["aba","aabb","abcd","bac","aabc"]) == 2  # mixed groups
assert sol.similarPairs(["aabb","ab","ba"]) == 3  # all similar
assert sol.similarPairs(["nba","cba","dba"]) == 0  # no similar pairs

# Single word
assert sol.similarPairs(["abc"]) == 0  # cannot form pairs

# Identical words
assert sol.similarPairs(["abc", "abc", "abc"]) == 3  # 3 choose 2

# Repeated letters inside words
assert sol.similarPairs(["aaaa", "aa", "a"]) == 3  # all reduce to {'a'}

# Different character sets
assert sol.similarPairs(["a", "b", "c"]) == 0  # all distinct

# Multiple groups
assert sol.similarPairs(["ab", "ba", "cd", "dc", "ef"]) == 2  # two separate groups

# Full alphabet
assert sol.similarPairs(
    ["abcdefghijklmnopqrstuvwxyz", "zyxwvutsrqponmlkjihgfedcba"]
) == 1  # same complete set

# Mixed repeated and unique
assert sol.similarPairs(["abc", "aabbcc", "ab", "aabb"]) == 2  # two matching groups

# Boundary style stress case
words = ["ab"] * 100
assert sol.similarPairs(words) == 4950  # 100 choose 2

Test Case Summary

Test	Why
Provided examples	Validates correctness against official cases
Single word	Ensures no invalid pair counting
Identical words	Tests combinational counting
Repeated letters	Verifies duplicates inside a word are ignored
Different character sets	Confirms non-similar words are excluded
Multiple groups	Tests independent grouping behavior
Full alphabet	Ensures large masks work correctly
Mixed repeated and unique	Confirms normalization behavior
100 identical words	Stress test for pair counting

Edge Cases

Words with repeated characters

A common mistake is treating character frequency as important. For example, "aaaa" and "a" should be considered similar because both contain only the character 'a'.

The bitmask approach handles this naturally. Setting the same bit multiple times does not change the mask, so repeated letters are automatically ignored.

All words are similar

If every word contains the same character set, the number of pairs grows quickly.

For example:

["ab", "ba", "aabb", "baba"]

All four words map to the same bitmask. The correct answer is:

4 choose 2 = 6

The implementation correctly accumulates pairs incrementally by adding the existing frequency each time a matching mask appears.

No words are similar

Another important edge case is when every word has a unique character set.

Example:

["a", "ab", "abc", "abcd"]

Each word generates a different mask, so no pairs should be counted.

The hash map lookup returns zero for every new mask, ensuring the final answer remains correct.

Single-word input

If the array contains only one word, no valid pair exists because pairs require two different indices.

The algorithm handles this automatically because the frequency map is initially empty, so no pairs are added during processing.