LeetCode 734 - Sentence Similarity

The problem gives us two sentences, where each sentence is represented as an array of words. We are also given a list of word pairs that define which words are considered similar. Our task is to determine whether the two sentences are similar according to the following rules: 1.

LeetCode Problem 734

Difficulty: 🟢 Easy
Topics: Array, Hash Table, String

Solution

Problem Understanding

The problem gives us two sentences, where each sentence is represented as an array of words. We are also given a list of word pairs that define which words are considered similar.

Our task is to determine whether the two sentences are similar according to the following rules:

  1. The sentences must contain the same number of words.
  2. At every index i, the word sentence1[i] must either:
  • be exactly equal to sentence2[i], or
  • appear as a similar pair in similarPairs.

An important detail is that similarity is not transitive. If "great" is similar to "fine" and "fine" is similar to "excellent", that does not mean "great" is similar to "excellent".

Another important detail is that similarity is symmetric. If "great" is similar to "fine", then "fine" should also be considered similar to "great".

The constraints are relatively small:

  • Up to 1000 words in each sentence
  • Up to 1000 similarity pairs

These limits are small enough that even moderately inefficient solutions may pass, but we should still aim for an efficient implementation.

Several edge cases are important:

  • The two sentences may have different lengths, in which case the answer is immediately false.
  • A word is always similar to itself, even if it does not appear in similarPairs.
  • The similarity list may be empty.
  • Similarity is not transitive, so we must only check direct pair relationships.
  • Similar pairs are distinct, so we do not need to handle duplicate entries.

Approaches

Brute Force Approach

A straightforward approach is to compare the two sentences word by word. For every pair of words, we scan through the entire similarPairs array to see whether the pair exists.

For each position i:

  • If the two words are identical, continue.

  • Otherwise, iterate through all pairs in similarPairs.

  • Check whether either:

  • (word1, word2) exists, or

  • (word2, word1) exists.

  • If no matching pair is found, return false.

This approach is correct because it explicitly checks every allowed similarity relation. However, it is inefficient because for every word position we may scan the entire similarity list.

If:

  • n is the sentence length
  • m is the number of similar pairs

then the time complexity becomes O(n * m).

With constraints up to 1000, this still passes comfortably, but we can do better.

Optimal Approach

The key observation is that repeated linear searches through similarPairs are unnecessary. We can preprocess all similarity relationships into a hash set for constant time lookup.

We store each pair in a set using tuples such as:

("great", "fine")

Since similarity is symmetric, we insert both directions:

("great", "fine")
("fine", "great")

Now, when comparing two words:

  • If the words are equal, they are automatically similar.
  • Otherwise, we simply check whether the tuple exists in the hash set.

Hash set lookup is O(1) on average, which reduces the total complexity to O(n + m).

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(1) Scans every similarity pair for each word comparison
Optimal O(n + m) O(m) Uses a hash set for constant time similarity lookup

Algorithm Walkthrough

  1. First, check whether the two sentences have the same length.

If the lengths differ, the sentences cannot be similar according to the problem definition, so immediately return false. 2. Create a hash set to store similarity relationships.

For every pair [a, b] in similarPairs, insert:

  • (a, b)
  • (b, a)

We insert both directions because similarity is symmetric. 3. Iterate through both sentences simultaneously.

For each index i:

  • Let word1 = sentence1[i]
  • Let word2 = sentence2[i]
  1. Check whether the words are identical.

If word1 == word2, continue to the next position because a word is always similar to itself. 5. Otherwise, check whether (word1, word2) exists in the hash set.

If it does not exist, the words are not similar, so return false. 6. If all word pairs pass the similarity check, return true.

Why it works

The algorithm works because every valid similarity relationship is explicitly stored in the hash set. For every corresponding word pair in the two sentences, we verify one of the only two allowed conditions:

  • the words are identical, or
  • the pair exists in the similarity relation.

Since the problem explicitly states that similarity is not transitive, checking only direct relationships is exactly what the problem requires.

Python Solution

from typing import List

class Solution:
    def areSentencesSimilar(
        self,
        sentence1: List[str],
        sentence2: List[str],
        similarPairs: List[List[str]]
    ) -> bool:
        
        if len(sentence1) != len(sentence2):
            return False

        similar_set = set()

        for word1, word2 in similarPairs:
            similar_set.add((word1, word2))
            similar_set.add((word2, word1))

        for word1, word2 in zip(sentence1, sentence2):
            if word1 == word2:
                continue

            if (word1, word2) not in similar_set:
                return False

        return True

The implementation begins by checking sentence lengths because differing lengths automatically invalidate similarity.

Next, we build a hash set containing all valid similarity relationships. We insert both directions of each pair to make lookups symmetric.

The main loop uses zip() to iterate through both sentences simultaneously. For each word pair:

  • If the words are identical, we continue immediately.
  • Otherwise, we check whether the pair exists in the hash set.

If any pair fails the check, the function immediately returns False. If the loop completes successfully, all word pairs are similar, so we return True.

Go Solution

func areSentencesSimilar(sentence1 []string, sentence2 []string, similarPairs [][]string) bool {
	if len(sentence1) != len(sentence2) {
		return false
	}

	similarSet := make(map[string]bool)

	for _, pair := range similarPairs {
		word1 := pair[0]
		word2 := pair[1]

		similarSet[word1+"#"+word2] = true
		similarSet[word2+"#"+word1] = true
	}

	for i := 0; i < len(sentence1); i++ {
		word1 := sentence1[i]
		word2 := sentence2[i]

		if word1 == word2 {
			continue
		}

		key := word1 + "#" + word2

		if !similarSet[key] {
			return false
		}
	}

	return true
}

In Go, tuples do not exist as conveniently as in Python, so we encode each pair into a single string key using a separator such as "#".

The Go version otherwise follows exactly the same logic:

  • Check sentence lengths
  • Store both similarity directions
  • Compare corresponding words
  • Return false immediately on failure

Go maps naturally support constant time average lookup, making the performance equivalent to the Python solution.

Worked Examples

Example 1

sentence1 = ["great","acting","skills"]
sentence2 = ["fine","drama","talent"]

similarPairs = [
    ["great","fine"],
    ["drama","acting"],
    ["skills","talent"]
]

Step 1: Build similarity set

Inserted Pair
("great", "fine")
("fine", "great")
("drama", "acting")
("acting", "drama")
("skills", "talent")
("talent", "skills")

Step 2: Compare words

Index sentence1 sentence2 Similar?
0 great fine Yes
1 acting drama Yes
2 skills talent Yes

All pairs are similar, so the answer is:

true

Example 2

sentence1 = ["great"]
sentence2 = ["great"]
similarPairs = []

Step 1: Length check

Both lengths are 1, so continue.

Step 2: Compare words

Index sentence1 sentence2 Equal?
0 great great Yes

A word is always similar to itself.

Result:

true

Example 3

sentence1 = ["great"]
sentence2 = ["doubleplus", "good"]

Step 1: Length check

sentence1 length sentence2 length
1 2

Lengths differ, so we immediately return:

false

Complexity Analysis

Measure Complexity Explanation
Time O(n + m) Building the hash set takes O(m), comparing sentence words takes O(n)
Space O(m) The hash set stores all similarity pairs

The dominant operations are building the similarity lookup structure and iterating through the sentences once. Hash set operations are constant time on average, which allows the algorithm to scale linearly with input size.

Test Cases

from typing import List

class Solution:
    def areSentencesSimilar(
        self,
        sentence1: List[str],
        sentence2: List[str],
        similarPairs: List[List[str]]
    ) -> bool:
        
        if len(sentence1) != len(sentence2):
            return False

        similar_set = set()

        for word1, word2 in similarPairs:
            similar_set.add((word1, word2))
            similar_set.add((word2, word1))

        for word1, word2 in zip(sentence1, sentence2):
            if word1 == word2:
                continue

            if (word1, word2) not in similar_set:
                return False

        return True

solution = Solution()

# Provided example 1
assert solution.areSentencesSimilar(
    ["great", "acting", "skills"],
    ["fine", "drama", "talent"],
    [["great", "fine"], ["drama", "acting"], ["skills", "talent"]]
) is True

# Provided example 2
assert solution.areSentencesSimilar(
    ["great"],
    ["great"],
    []
) is True

# Provided example 3
assert solution.areSentencesSimilar(
    ["great"],
    ["doubleplus", "good"],
    [["great", "doubleplus"]]
) is False

# Different lengths
assert solution.areSentencesSimilar(
    ["a", "b"],
    ["a"],
    []
) is False

# Direct similarity exists
assert solution.areSentencesSimilar(
    ["hello"],
    ["hi"],
    [["hello", "hi"]]
) is True

# Reverse direction should work
assert solution.areSentencesSimilar(
    ["hi"],
    ["hello"],
    [["hello", "hi"]]
) is True

# Non-transitive similarity
assert solution.areSentencesSimilar(
    ["a"],
    ["c"],
    [["a", "b"], ["b", "c"]]
) is False

# Empty similarity list with unequal words
assert solution.areSentencesSimilar(
    ["one"],
    ["two"],
    []
) is False

# Multiple identical words
assert solution.areSentencesSimilar(
    ["same", "same"],
    ["same", "same"],
    []
) is True

# Mixed identical and similar words
assert solution.areSentencesSimilar(
    ["great", "movie"],
    ["fine", "movie"],
    [["great", "fine"]]
) is True
Test Why
Example 1 Validates normal similarity matching
Example 2 Validates self similarity
Example 3 Validates length mismatch handling
Different lengths Confirms immediate rejection
Direct similarity exists Tests standard pair lookup
Reverse direction should work Verifies symmetry handling
Non-transitive similarity Ensures transitivity is not assumed
Empty similarity list with unequal words Confirms unrelated words fail
Multiple identical words Tests repeated self similarity
Mixed identical and similar words Tests combined scenarios

Edge Cases

Different Sentence Lengths

If the two sentences contain different numbers of words, they cannot possibly be similar because similarity is defined position by position.

A common mistake is attempting to compare words until one sentence ends. That would incorrectly ignore unmatched trailing words. The implementation prevents this by checking lengths before any other processing.

Words Not Present in Similar Pairs

Some words may never appear in similarPairs. That does not automatically make them invalid because a word is always similar to itself.

For example:

sentence1 = ["leetcode"]
sentence2 = ["leetcode"]

Even with an empty similarity list, the answer should still be true.

The implementation handles this correctly using the equality check:

if word1 == word2:
    continue

Non-Transitive Relationships

This is the most important logical edge case.

Suppose:

"a" ~ "b"
"b" ~ "c"

The problem explicitly states that "a" is not necessarily similar to "c".

A common mistake is using Union Find or graph connectivity, which would incorrectly treat similarity as transitive.

The implementation avoids this problem entirely by checking only direct pair membership in the hash set.