LeetCode 737 - Sentence Similarity II

This problem asks us to determine whether two sentences are considered similar based on a set of similarity relationships between words. Each sentence is represented as an array of strings, where each string is a single word.

LeetCode Problem 737

Difficulty: 🟡 Medium
Topics: Array, Hash Table, String, Depth-First Search, Breadth-First Search, Union-Find

Solution

Problem Understanding

This problem asks us to determine whether two sentences are considered similar based on a set of similarity relationships between words.

Each sentence is represented as an array of strings, where each string is a single word. We are also given a list of word pairs, where each pair indicates that the two words are similar.

The important detail is that similarity is not limited to direct relationships. The relation is transitive. If "great" is similar to "good" and "good" is similar to "fine", then "great" is also similar to "fine".

This means the similarity relationships form groups of connected words. Any two words inside the same connected group are considered similar.

The problem requires us to compare the two sentences word by word:

  • The sentences must have the same number of words.

  • For every index i, sentence1[i] must either:

  • be exactly equal to sentence2[i], or

  • belong to the same similarity group.

If every corresponding pair of words satisfies this condition, we return true. Otherwise, we return false.

The constraints are important:

  • Sentence lengths can be up to 1000
  • There can be up to 2000 similarity pairs

These values are large enough that repeatedly searching the graph from scratch for every word comparison could become inefficient. We need a solution that efficiently groups connected words together.

There are several important edge cases:

  • The sentences may have different lengths, which immediately makes them not similar.
  • Some words may not appear in similarPairs at all.
  • A word is always similar to itself, even if it never appears in any pair.
  • Similarity chains may be long, such as a -> b -> c -> d.
  • The graph may contain disconnected components.
  • Cycles may exist in the similarity graph.

A correct solution must handle all of these situations properly.

Approaches

Brute Force Approach

The most direct approach is to treat the similarity pairs as an undirected graph.

Each word becomes a node, and every similarity pair creates an edge between two words.

For every pair of words at the same index in the two sentences:

  1. If the words are equal, continue.
  2. Otherwise, perform a DFS or BFS starting from the first word to see whether the second word can be reached through transitive similarity relationships.

This approach is correct because graph traversal explores every possible similarity chain.

However, it is inefficient because we may repeatedly traverse the same parts of the graph for many word comparisons. In the worst case, each lookup could scan the entire graph.

Optimal Approach, Union Find

The key insight is that similarity forms connected components.

If two words belong to the same connected component, they are similar. This is exactly the type of problem that Union Find, also called Disjoint Set Union, is designed to solve efficiently.

Instead of repeatedly searching the graph, we preprocess all similarity pairs:

  • Every word starts in its own set.
  • For every pair (a, b), we union their sets.
  • After processing all pairs, words in the same connected component share the same representative parent.

Then comparing words becomes simple:

  • If the words are equal, they are similar.
  • Otherwise, check whether they belong to the same Union Find set.

Using path compression and union by rank makes the operations nearly constant time.

Approach Time Complexity Space Complexity Notes
Brute Force O(n × (V + E)) O(V + E) Repeated DFS/BFS for every word comparison
Optimal O((V + E) × α(V)) O(V) Uses Union Find to preprocess connected components

Here:

  • n is the sentence length
  • V is the number of unique words
  • E is the number of similarity pairs
  • α(V) is the inverse Ackermann function, which is effectively constant

Algorithm Walkthrough

Step 1: Check Sentence Lengths

The first requirement is that the sentences must contain the same number of words.

If the lengths differ, we immediately return False because there is no way to match words position by position.

Step 2: Initialize Union Find Structures

We create two hash maps:

  • parent[word] stores the representative parent of the word
  • rank[word] stores the approximate tree depth for balancing unions

We use hash maps because the nodes are strings, not integers.

Step 3: Create Sets for Every Word

As we process similarity pairs, we ensure every encountered word exists in the Union Find structure.

Initially:

  • Every word is its own parent
  • Every word has rank 0

This means each word starts as an independent connected component.

Step 4: Union Similar Words

For every pair (a, b):

  1. Find the root parent of a
  2. Find the root parent of b
  3. If the roots differ, merge the sets

We use:

  • Path compression during find
  • Union by rank during union

These optimizations keep the trees shallow and operations fast.

Step 5: Compare Corresponding Sentence Words

Now we iterate through both sentences simultaneously.

For each pair (word1, word2):

  1. If the words are equal, continue
  2. If either word does not exist in the Union Find structure, they cannot be similar
  3. Otherwise, compare their root parents
  4. If the roots differ, return False

If all positions pass the check, return True.

Why it works

Union Find guarantees that all transitively connected words end up in the same connected component.

If two words can be connected through any chain of similarity relationships, repeated union operations eventually place them under the same root parent.

Therefore:

  • Equal roots mean the words are similar
  • Different roots mean no similarity path exists

This invariant guarantees correctness.

Python Solution

from typing import List

class Solution:
    def areSentencesSimilarTwo(
        self,
        sentence1: List[str],
        sentence2: List[str],
        similarPairs: List[List[str]]
    ) -> bool:

        if len(sentence1) != len(sentence2):
            return False

        parent = {}
        rank = {}

        def find(word: str) -> str:
            if parent[word] != word:
                parent[word] = find(parent[word])
            return parent[word]

        def union(word1: str, word2: str) -> None:
            root1 = find(word1)
            root2 = find(word2)

            if root1 == root2:
                return

            if rank[root1] < rank[root2]:
                parent[root1] = root2
            elif rank[root1] > rank[root2]:
                parent[root2] = root1
            else:
                parent[root2] = root1
                rank[root1] += 1

        def add_word(word: str) -> None:
            if word not in parent:
                parent[word] = word
                rank[word] = 0

        for word1, word2 in similarPairs:
            add_word(word1)
            add_word(word2)
            union(word1, word2)

        for word1, word2 in zip(sentence1, sentence2):
            if word1 == word2:
                continue

            if word1 not in parent or word2 not in parent:
                return False

            if find(word1) != find(word2):
                return False

        return True

The implementation starts by checking whether the two sentences have the same length. This satisfies the first requirement of the problem.

The parent dictionary stores the Union Find parent relationships, while the rank dictionary helps keep the trees balanced during union operations.

The find function locates the root representative of a word. Path compression is applied by updating each visited node directly to the root, which speeds up future lookups.

The union function merges two connected components. Union by rank ensures the smaller tree attaches under the larger tree, minimizing tree height.

The add_word helper ensures every encountered word exists in the data structure before union operations occur.

After processing all similarity pairs, the code compares corresponding words in the two sentences. Equal words are automatically considered similar. Otherwise, both words must exist in the Union Find structure and share the same root parent.

If any comparison fails, the function returns False. If all comparisons succeed, it returns True.

Go Solution

func areSentencesSimilarTwo(sentence1 []string, sentence2 []string, similarPairs [][]string) bool {
	if len(sentence1) != len(sentence2) {
		return false
	}

	parent := make(map[string]string)
	rank := make(map[string]int)

	var find func(string) string

	find = func(word string) string {
		if parent[word] != word {
			parent[word] = find(parent[word])
		}
		return parent[word]
	}

	addWord := func(word string) {
		if _, exists := parent[word]; !exists {
			parent[word] = word
			rank[word] = 0
		}
	}

	union := func(word1 string, word2 string) {
		root1 := find(word1)
		root2 := find(word2)

		if root1 == root2 {
			return
		}

		if rank[root1] < rank[root2] {
			parent[root1] = root2
		} else if rank[root1] > rank[root2] {
			parent[root2] = root1
		} else {
			parent[root2] = root1
			rank[root1]++
		}
	}

	for _, pair := range similarPairs {
		word1 := pair[0]
		word2 := pair[1]

		addWord(word1)
		addWord(word2)

		union(word1, word2)
	}

	for i := 0; i < len(sentence1); i++ {
		word1 := sentence1[i]
		word2 := sentence2[i]

		if word1 == word2 {
			continue
		}

		_, exists1 := parent[word1]
		_, exists2 := parent[word2]

		if !exists1 || !exists2 {
			return false
		}

		if find(word1) != find(word2) {
			return false
		}
	}

	return true
}

The Go implementation follows the same logic as the Python version but uses Go maps instead of dictionaries.

Because Go does not support nested named functions in the same way as Python, the find function is declared as a variable first so it can recursively reference itself.

Go maps naturally handle dynamic insertion of string keys, making them well suited for storing Union Find parent relationships.

There are no integer overflow concerns because the rank values remain very small.

Worked Examples

Example 1

sentence1 = ["great","acting","skills"]
sentence2 = ["fine","drama","talent"]

similarPairs = [
    ["great","good"],
    ["fine","good"],
    ["drama","acting"],
    ["skills","talent"]
]

Building the Union Find Structure

Pair Action Resulting Connections
great, good Union great ↔ good
fine, good Union fine ↔ good ↔ great
drama, acting Union drama ↔ acting
skills, talent Union skills ↔ talent

Final connected components:

Component
{great, good, fine}
{drama, acting}
{skills, talent}

Comparing Sentences

Index sentence1 sentence2 Similar?
0 great fine Yes, same component
1 acting drama Yes, same component
2 skills talent Yes, same component

Result:

true

Example 2

sentence1 = ["I","love","leetcode"]
sentence2 = ["I","love","onepiece"]

Similarity chain:

leetcode → platform → anime → manga → onepiece

Union Operations

Pair Connection
manga, onepiece manga ↔ onepiece
platform, anime platform ↔ anime
leetcode, platform leetcode ↔ platform
anime, manga anime ↔ manga

All become part of one component:

{leetcode, platform, anime, manga, onepiece}

Sentence Comparison

Index Word 1 Word 2 Result
0 I I Equal
1 love love Equal
2 leetcode onepiece Same component

Result:

true

Example 3

sentence1 = ["I","love","leetcode"]
sentence2 = ["I","love","onepiece"]

Similarity pairs:

manga ↔ hunterXhunter
platform ↔ anime
leetcode ↔ platform
anime ↔ manga

Connected component:

{leetcode, platform, anime, manga, hunterXhunter}

The word "onepiece" never becomes connected.

Comparison

Index Word 1 Word 2 Result
0 I I Equal
1 love love Equal
2 leetcode onepiece Different components

Result:

false

Complexity Analysis

Measure Complexity Explanation
Time O((V + E) × α(V)) Union and find operations are nearly constant time
Space O(V) Stores parent and rank for each unique word

Here:

  • V is the number of unique words
  • E is the number of similarity pairs
  • α(V) is the inverse Ackermann function

Because path compression and union by rank are used together, Union Find operations become extremely efficient in practice.

Test Cases

from typing import List

class Solution:
    def areSentencesSimilarTwo(
        self,
        sentence1: List[str],
        sentence2: List[str],
        similarPairs: List[List[str]]
    ) -> bool:

        if len(sentence1) != len(sentence2):
            return False

        parent = {}
        rank = {}

        def find(word):
            if parent[word] != word:
                parent[word] = find(parent[word])
            return parent[word]

        def union(a, b):
            root_a = find(a)
            root_b = find(b)

            if root_a == root_b:
                return

            if rank[root_a] < rank[root_b]:
                parent[root_a] = root_b
            elif rank[root_a] > rank[root_b]:
                parent[root_b] = root_a
            else:
                parent[root_b] = root_a
                rank[root_a] += 1

        def add(word):
            if word not in parent:
                parent[word] = word
                rank[word] = 0

        for a, b in similarPairs:
            add(a)
            add(b)
            union(a, b)

        for a, b in zip(sentence1, sentence2):
            if a == b:
                continue

            if a not in parent or b not in parent:
                return False

            if find(a) != find(b):
                return False

        return True

solution = Solution()

# Example 1
assert solution.areSentencesSimilarTwo(
    ["great","acting","skills"],
    ["fine","drama","talent"],
    [["great","good"],["fine","good"],["drama","acting"],["skills","talent"]]
) == True  # transitive similarity

# Example 2
assert solution.areSentencesSimilarTwo(
    ["I","love","leetcode"],
    ["I","love","onepiece"],
    [["manga","onepiece"],["platform","anime"],
     ["leetcode","platform"],["anime","manga"]]
) == True  # long similarity chain

# Example 3
assert solution.areSentencesSimilarTwo(
    ["I","love","leetcode"],
    ["I","love","onepiece"],
    [["manga","hunterXhunter"],["platform","anime"],
     ["leetcode","platform"],["anime","manga"]]
) == False  # disconnected components

# Different sentence lengths
assert solution.areSentencesSimilarTwo(
    ["hello"],
    ["hello", "world"],
    []
) == False  # length mismatch

# Identical sentences without pairs
assert solution.areSentencesSimilarTwo(
    ["a", "b"],
    ["a", "b"],
    []
) == True  # words are always similar to themselves

# Word missing from similarity graph
assert solution.areSentencesSimilarTwo(
    ["a"],
    ["b"],
    []
) == False  # unrelated words

# Direct similarity
assert solution.areSentencesSimilarTwo(
    ["a"],
    ["b"],
    [["a", "b"]]
) == True  # directly connected

# Cyclic relationships
assert solution.areSentencesSimilarTwo(
    ["a"],
    ["c"],
    [["a", "b"], ["b", "c"], ["c", "a"]]
) == True  # cycle handling

# Multiple disconnected groups
assert solution.areSentencesSimilarTwo(
    ["x", "y"],
    ["a", "b"],
    [["x", "a"]]
) == False  # second pair disconnected

# Long transitive chain
assert solution.areSentencesSimilarTwo(
    ["a"],
    ["f"],
    [["a","b"],["b","c"],["c","d"],["d","e"],["e","f"]]
) == True  # deep transitive connection
Test Why
Example 1 Validates standard transitive similarity
Example 2 Tests long similarity chains
Example 3 Tests disconnected components
Different sentence lengths Ensures early rejection works
Identical sentences without pairs Verifies self similarity
Word missing from graph Ensures unrelated words fail
Direct similarity Tests basic union behavior
Cyclic relationships Verifies cycles do not break logic
Multiple disconnected groups Tests partial connectivity
Long transitive chain Confirms deep transitive closure

Edge Cases

Different Sentence Lengths

If the sentences contain different numbers of words, they can never be similar because the comparison is position based.

A naive implementation might attempt to compare only overlapping positions, producing incorrect results.

The implementation handles this immediately with an early length check:

if len(sentence1) != len(sentence2):
    return False

Words Missing from Similarity Pairs

Some words may never appear in similarPairs.

For example:

sentence1 = ["a"]
sentence2 = ["b"]
similarPairs = []

Since "a" and "b" are different and no similarity relationship exists, the answer must be False.

The implementation correctly checks whether both words exist in the Union Find structure before attempting root comparison.

Long Transitive Chains

Similarity may require many intermediate connections:

a → b → c → d → e → f

A naive direct comparison would fail because "a" and "f" are not directly connected.

Union Find correctly merges all connected words into one component, so both words eventually share the same root parent.

Cyclic Similarity Relationships

The graph may contain cycles:

a ↔ b
b ↔ c
c ↔ a

Recursive graph traversals can sometimes revisit nodes infinitely if implemented incorrectly.

Union Find naturally handles cycles safely because union operations simply detect when two nodes already belong to the same set.