LeetCode 737 - Sentence Similarity II
This problem asks us to determine whether two sentences are considered similar based on a set of similarity relationships between words. Each sentence is represented as an array of strings, where each string is a single word.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Depth-First Search, Breadth-First Search, Union-Find
Solution
Problem Understanding
This problem asks us to determine whether two sentences are considered similar based on a set of similarity relationships between words.
Each sentence is represented as an array of strings, where each string is a single word. We are also given a list of word pairs, where each pair indicates that the two words are similar.
The important detail is that similarity is not limited to direct relationships. The relation is transitive. If "great" is similar to "good" and "good" is similar to "fine", then "great" is also similar to "fine".
This means the similarity relationships form groups of connected words. Any two words inside the same connected group are considered similar.
The problem requires us to compare the two sentences word by word:
-
The sentences must have the same number of words.
-
For every index
i,sentence1[i]must either: -
be exactly equal to
sentence2[i], or -
belong to the same similarity group.
If every corresponding pair of words satisfies this condition, we return true. Otherwise, we return false.
The constraints are important:
- Sentence lengths can be up to
1000 - There can be up to
2000similarity pairs
These values are large enough that repeatedly searching the graph from scratch for every word comparison could become inefficient. We need a solution that efficiently groups connected words together.
There are several important edge cases:
- The sentences may have different lengths, which immediately makes them not similar.
- Some words may not appear in
similarPairsat all. - A word is always similar to itself, even if it never appears in any pair.
- Similarity chains may be long, such as
a -> b -> c -> d. - The graph may contain disconnected components.
- Cycles may exist in the similarity graph.
A correct solution must handle all of these situations properly.
Approaches
Brute Force Approach
The most direct approach is to treat the similarity pairs as an undirected graph.
Each word becomes a node, and every similarity pair creates an edge between two words.
For every pair of words at the same index in the two sentences:
- If the words are equal, continue.
- Otherwise, perform a DFS or BFS starting from the first word to see whether the second word can be reached through transitive similarity relationships.
This approach is correct because graph traversal explores every possible similarity chain.
However, it is inefficient because we may repeatedly traverse the same parts of the graph for many word comparisons. In the worst case, each lookup could scan the entire graph.
Optimal Approach, Union Find
The key insight is that similarity forms connected components.
If two words belong to the same connected component, they are similar. This is exactly the type of problem that Union Find, also called Disjoint Set Union, is designed to solve efficiently.
Instead of repeatedly searching the graph, we preprocess all similarity pairs:
- Every word starts in its own set.
- For every pair
(a, b), we union their sets. - After processing all pairs, words in the same connected component share the same representative parent.
Then comparing words becomes simple:
- If the words are equal, they are similar.
- Otherwise, check whether they belong to the same Union Find set.
Using path compression and union by rank makes the operations nearly constant time.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n × (V + E)) | O(V + E) | Repeated DFS/BFS for every word comparison |
| Optimal | O((V + E) × α(V)) | O(V) | Uses Union Find to preprocess connected components |
Here:
nis the sentence lengthVis the number of unique wordsEis the number of similarity pairsα(V)is the inverse Ackermann function, which is effectively constant
Algorithm Walkthrough
Step 1: Check Sentence Lengths
The first requirement is that the sentences must contain the same number of words.
If the lengths differ, we immediately return False because there is no way to match words position by position.
Step 2: Initialize Union Find Structures
We create two hash maps:
parent[word]stores the representative parent of the wordrank[word]stores the approximate tree depth for balancing unions
We use hash maps because the nodes are strings, not integers.
Step 3: Create Sets for Every Word
As we process similarity pairs, we ensure every encountered word exists in the Union Find structure.
Initially:
- Every word is its own parent
- Every word has rank
0
This means each word starts as an independent connected component.
Step 4: Union Similar Words
For every pair (a, b):
- Find the root parent of
a - Find the root parent of
b - If the roots differ, merge the sets
We use:
- Path compression during
find - Union by rank during
union
These optimizations keep the trees shallow and operations fast.
Step 5: Compare Corresponding Sentence Words
Now we iterate through both sentences simultaneously.
For each pair (word1, word2):
- If the words are equal, continue
- If either word does not exist in the Union Find structure, they cannot be similar
- Otherwise, compare their root parents
- If the roots differ, return
False
If all positions pass the check, return True.
Why it works
Union Find guarantees that all transitively connected words end up in the same connected component.
If two words can be connected through any chain of similarity relationships, repeated union operations eventually place them under the same root parent.
Therefore:
- Equal roots mean the words are similar
- Different roots mean no similarity path exists
This invariant guarantees correctness.
Python Solution
from typing import List
class Solution:
def areSentencesSimilarTwo(
self,
sentence1: List[str],
sentence2: List[str],
similarPairs: List[List[str]]
) -> bool:
if len(sentence1) != len(sentence2):
return False
parent = {}
rank = {}
def find(word: str) -> str:
if parent[word] != word:
parent[word] = find(parent[word])
return parent[word]
def union(word1: str, word2: str) -> None:
root1 = find(word1)
root2 = find(word2)
if root1 == root2:
return
if rank[root1] < rank[root2]:
parent[root1] = root2
elif rank[root1] > rank[root2]:
parent[root2] = root1
else:
parent[root2] = root1
rank[root1] += 1
def add_word(word: str) -> None:
if word not in parent:
parent[word] = word
rank[word] = 0
for word1, word2 in similarPairs:
add_word(word1)
add_word(word2)
union(word1, word2)
for word1, word2 in zip(sentence1, sentence2):
if word1 == word2:
continue
if word1 not in parent or word2 not in parent:
return False
if find(word1) != find(word2):
return False
return True
The implementation starts by checking whether the two sentences have the same length. This satisfies the first requirement of the problem.
The parent dictionary stores the Union Find parent relationships, while the rank dictionary helps keep the trees balanced during union operations.
The find function locates the root representative of a word. Path compression is applied by updating each visited node directly to the root, which speeds up future lookups.
The union function merges two connected components. Union by rank ensures the smaller tree attaches under the larger tree, minimizing tree height.
The add_word helper ensures every encountered word exists in the data structure before union operations occur.
After processing all similarity pairs, the code compares corresponding words in the two sentences. Equal words are automatically considered similar. Otherwise, both words must exist in the Union Find structure and share the same root parent.
If any comparison fails, the function returns False. If all comparisons succeed, it returns True.
Go Solution
func areSentencesSimilarTwo(sentence1 []string, sentence2 []string, similarPairs [][]string) bool {
if len(sentence1) != len(sentence2) {
return false
}
parent := make(map[string]string)
rank := make(map[string]int)
var find func(string) string
find = func(word string) string {
if parent[word] != word {
parent[word] = find(parent[word])
}
return parent[word]
}
addWord := func(word string) {
if _, exists := parent[word]; !exists {
parent[word] = word
rank[word] = 0
}
}
union := func(word1 string, word2 string) {
root1 := find(word1)
root2 := find(word2)
if root1 == root2 {
return
}
if rank[root1] < rank[root2] {
parent[root1] = root2
} else if rank[root1] > rank[root2] {
parent[root2] = root1
} else {
parent[root2] = root1
rank[root1]++
}
}
for _, pair := range similarPairs {
word1 := pair[0]
word2 := pair[1]
addWord(word1)
addWord(word2)
union(word1, word2)
}
for i := 0; i < len(sentence1); i++ {
word1 := sentence1[i]
word2 := sentence2[i]
if word1 == word2 {
continue
}
_, exists1 := parent[word1]
_, exists2 := parent[word2]
if !exists1 || !exists2 {
return false
}
if find(word1) != find(word2) {
return false
}
}
return true
}
The Go implementation follows the same logic as the Python version but uses Go maps instead of dictionaries.
Because Go does not support nested named functions in the same way as Python, the find function is declared as a variable first so it can recursively reference itself.
Go maps naturally handle dynamic insertion of string keys, making them well suited for storing Union Find parent relationships.
There are no integer overflow concerns because the rank values remain very small.
Worked Examples
Example 1
sentence1 = ["great","acting","skills"]
sentence2 = ["fine","drama","talent"]
similarPairs = [
["great","good"],
["fine","good"],
["drama","acting"],
["skills","talent"]
]
Building the Union Find Structure
| Pair | Action | Resulting Connections |
|---|---|---|
| great, good | Union | great ↔ good |
| fine, good | Union | fine ↔ good ↔ great |
| drama, acting | Union | drama ↔ acting |
| skills, talent | Union | skills ↔ talent |
Final connected components:
| Component |
|---|
| {great, good, fine} |
| {drama, acting} |
| {skills, talent} |
Comparing Sentences
| Index | sentence1 | sentence2 | Similar? |
|---|---|---|---|
| 0 | great | fine | Yes, same component |
| 1 | acting | drama | Yes, same component |
| 2 | skills | talent | Yes, same component |
Result:
true
Example 2
sentence1 = ["I","love","leetcode"]
sentence2 = ["I","love","onepiece"]
Similarity chain:
leetcode → platform → anime → manga → onepiece
Union Operations
| Pair | Connection |
|---|---|
| manga, onepiece | manga ↔ onepiece |
| platform, anime | platform ↔ anime |
| leetcode, platform | leetcode ↔ platform |
| anime, manga | anime ↔ manga |
All become part of one component:
{leetcode, platform, anime, manga, onepiece}
Sentence Comparison
| Index | Word 1 | Word 2 | Result |
|---|---|---|---|
| 0 | I | I | Equal |
| 1 | love | love | Equal |
| 2 | leetcode | onepiece | Same component |
Result:
true
Example 3
sentence1 = ["I","love","leetcode"]
sentence2 = ["I","love","onepiece"]
Similarity pairs:
manga ↔ hunterXhunter
platform ↔ anime
leetcode ↔ platform
anime ↔ manga
Connected component:
{leetcode, platform, anime, manga, hunterXhunter}
The word "onepiece" never becomes connected.
Comparison
| Index | Word 1 | Word 2 | Result |
|---|---|---|---|
| 0 | I | I | Equal |
| 1 | love | love | Equal |
| 2 | leetcode | onepiece | Different components |
Result:
false
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O((V + E) × α(V)) | Union and find operations are nearly constant time |
| Space | O(V) | Stores parent and rank for each unique word |
Here:
Vis the number of unique wordsEis the number of similarity pairsα(V)is the inverse Ackermann function
Because path compression and union by rank are used together, Union Find operations become extremely efficient in practice.
Test Cases
from typing import List
class Solution:
def areSentencesSimilarTwo(
self,
sentence1: List[str],
sentence2: List[str],
similarPairs: List[List[str]]
) -> bool:
if len(sentence1) != len(sentence2):
return False
parent = {}
rank = {}
def find(word):
if parent[word] != word:
parent[word] = find(parent[word])
return parent[word]
def union(a, b):
root_a = find(a)
root_b = find(b)
if root_a == root_b:
return
if rank[root_a] < rank[root_b]:
parent[root_a] = root_b
elif rank[root_a] > rank[root_b]:
parent[root_b] = root_a
else:
parent[root_b] = root_a
rank[root_a] += 1
def add(word):
if word not in parent:
parent[word] = word
rank[word] = 0
for a, b in similarPairs:
add(a)
add(b)
union(a, b)
for a, b in zip(sentence1, sentence2):
if a == b:
continue
if a not in parent or b not in parent:
return False
if find(a) != find(b):
return False
return True
solution = Solution()
# Example 1
assert solution.areSentencesSimilarTwo(
["great","acting","skills"],
["fine","drama","talent"],
[["great","good"],["fine","good"],["drama","acting"],["skills","talent"]]
) == True # transitive similarity
# Example 2
assert solution.areSentencesSimilarTwo(
["I","love","leetcode"],
["I","love","onepiece"],
[["manga","onepiece"],["platform","anime"],
["leetcode","platform"],["anime","manga"]]
) == True # long similarity chain
# Example 3
assert solution.areSentencesSimilarTwo(
["I","love","leetcode"],
["I","love","onepiece"],
[["manga","hunterXhunter"],["platform","anime"],
["leetcode","platform"],["anime","manga"]]
) == False # disconnected components
# Different sentence lengths
assert solution.areSentencesSimilarTwo(
["hello"],
["hello", "world"],
[]
) == False # length mismatch
# Identical sentences without pairs
assert solution.areSentencesSimilarTwo(
["a", "b"],
["a", "b"],
[]
) == True # words are always similar to themselves
# Word missing from similarity graph
assert solution.areSentencesSimilarTwo(
["a"],
["b"],
[]
) == False # unrelated words
# Direct similarity
assert solution.areSentencesSimilarTwo(
["a"],
["b"],
[["a", "b"]]
) == True # directly connected
# Cyclic relationships
assert solution.areSentencesSimilarTwo(
["a"],
["c"],
[["a", "b"], ["b", "c"], ["c", "a"]]
) == True # cycle handling
# Multiple disconnected groups
assert solution.areSentencesSimilarTwo(
["x", "y"],
["a", "b"],
[["x", "a"]]
) == False # second pair disconnected
# Long transitive chain
assert solution.areSentencesSimilarTwo(
["a"],
["f"],
[["a","b"],["b","c"],["c","d"],["d","e"],["e","f"]]
) == True # deep transitive connection
| Test | Why |
|---|---|
| Example 1 | Validates standard transitive similarity |
| Example 2 | Tests long similarity chains |
| Example 3 | Tests disconnected components |
| Different sentence lengths | Ensures early rejection works |
| Identical sentences without pairs | Verifies self similarity |
| Word missing from graph | Ensures unrelated words fail |
| Direct similarity | Tests basic union behavior |
| Cyclic relationships | Verifies cycles do not break logic |
| Multiple disconnected groups | Tests partial connectivity |
| Long transitive chain | Confirms deep transitive closure |
Edge Cases
Different Sentence Lengths
If the sentences contain different numbers of words, they can never be similar because the comparison is position based.
A naive implementation might attempt to compare only overlapping positions, producing incorrect results.
The implementation handles this immediately with an early length check:
if len(sentence1) != len(sentence2):
return False
Words Missing from Similarity Pairs
Some words may never appear in similarPairs.
For example:
sentence1 = ["a"]
sentence2 = ["b"]
similarPairs = []
Since "a" and "b" are different and no similarity relationship exists, the answer must be False.
The implementation correctly checks whether both words exist in the Union Find structure before attempting root comparison.
Long Transitive Chains
Similarity may require many intermediate connections:
a → b → c → d → e → f
A naive direct comparison would fail because "a" and "f" are not directly connected.
Union Find correctly merges all connected words into one component, so both words eventually share the same root parent.
Cyclic Similarity Relationships
The graph may contain cycles:
a ↔ b
b ↔ c
c ↔ a
Recursive graph traversals can sometimes revisit nodes infinitely if implemented incorrectly.
Union Find naturally handles cycles safely because union operations simply detect when two nodes already belong to the same set.