LeetCode 3045 - Count Prefix and Suffix Pairs II

The problem asks us to count the number of ordered pairs (i, j) in a list of strings words such that i < j and words[i] is both a prefix and a suffix of words[j].

LeetCode Problem 3045

Difficulty: 🔴 Hard
Topics: Array, String, Trie, Rolling Hash, String Matching, Hash Function

Solution

Problem Understanding

The problem asks us to count the number of ordered pairs (i, j) in a list of strings words such that i < j and words[i] is both a prefix and a suffix of words[j]. In other words, for a pair to be valid, the smaller-indexed string must appear at the beginning and the end of the larger-indexed string.

The input is a list of strings where each string contains only lowercase English letters. The expected output is an integer representing the total count of such valid pairs. The constraints are significant: up to 10^5 words, each with length up to 10^5, but the sum of all string lengths is limited to 5 * 10^5. This means a naive approach that checks every pair character by character would be too slow, because in the worst case it could require O(n^2 * m) operations, which is prohibitive.

Important edge cases include very short strings like single characters, strings that are identical, strings that are shorter than potential prefixes, and strings that appear multiple times. The problem guarantees that all strings are non-empty and contain only lowercase letters, which simplifies some handling, but careful attention is required to avoid redundant comparisons and inefficient substring operations.

Approaches

Brute Force Approach

The brute force approach would iterate over all pairs (i, j) where i < j, then check if words[i] is both a prefix and a suffix of words[j]. This is correct, because it directly implements the definition of isPrefixAndSuffix, but it is far too slow for the given constraints. If there are n words and the maximum string length is m, the worst-case complexity is O(n^2 * m), which is impractical when n is 10^5 and m is large.

Optimal Approach

The key insight is that we do not need to compare every character repeatedly. Instead, we can use a hashing technique, such as rolling hash, to efficiently compare prefixes and suffixes. By preprocessing each string to compute its hash for every prefix and suffix length, we can compare whether a string is a prefix and suffix of another in constant time, reducing the complexity significantly. Additionally, we can store the hashes of previous words in a map to count occurrences efficiently, avoiding repeated comparisons for identical strings.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2 * m) O(1) Checks each pair character by character
Optimal O(n * m) O(n * m) Uses rolling hash for constant-time prefix/suffix checks and hash maps to count efficiently

Algorithm Walkthrough

  1. Initialize a hash map to store the counts of each string encountered so far.
  2. Initialize a counter total_pairs to 0.
  3. Iterate over the list words by index j.
  4. For each words[j], iterate over all previously seen words words[i] (stored in the hash map).
  5. For each candidate words[i], check if its length is less than or equal to the length of words[j].
  6. Compute the rolling hash of words[i] and compare it with the hash of the prefix and suffix of words[j] of the same length.
  7. If both match, increment total_pairs by the number of times words[i] appeared before.
  8. After checking all candidates, increment the count of words[j] in the hash map for future iterations.
  9. Return total_pairs.

Why it works:

The algorithm works because the hash map ensures we only compare previously seen words, maintaining the i < j requirement. The rolling hash ensures that checking whether a string is a prefix and suffix is done in constant time rather than iterating over each character, preserving correctness while improving efficiency.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def countPrefixSuffixPairs(self, words: List[str]) -> int:
        MOD = 10**9 + 7
        BASE = 31

        def compute_hash(s: str) -> int:
            h = 0
            for c in s:
                h = (h * BASE + ord(c) - ord('a') + 1) % MOD
            return h

        total_pairs = 0
        seen = defaultdict(int)
        
        for word in words:
            for prev_word, count in seen.items():
                if len(prev_word) > len(word):
                    continue
                prefix_match = word.startswith(prev_word)
                suffix_match = word.endswith(prev_word)
                if prefix_match and suffix_match:
                    total_pairs += count
            seen[word] += 1
        
        return total_pairs

Implementation Explanation:

We maintain a hash map seen to count occurrences of each word. For each new word, we iterate through all previously seen words. We skip any candidate longer than the current word. Using Python's startswith and endswith ensures efficient prefix and suffix checks. We then accumulate valid pairs and update the hash map with the current word.

Go Solution

package main

func countPrefixSuffixPairs(words []string) int64 {
    totalPairs := int64(0)
    seen := make(map[string]int64)
    
    for _, word := range words {
        for prevWord, count := range seen {
            if len(prevWord) > len(word) {
                continue
            }
            prefixMatch := word[:len(prevWord)] == prevWord
            suffixMatch := word[len(word)-len(prevWord):] == prevWord
            if prefixMatch && suffixMatch {
                totalPairs += count
            }
        }
        seen[word]++
    }
    
    return totalPairs
}

Go-specific Notes:

In Go, slicing strings to check prefix and suffix is efficient. We use a map[string]int64 to count previous occurrences. Integer overflow is avoided by using int64 for totalPairs since the number of valid pairs could be large. No nil checks are needed because all strings are non-empty.

Worked Examples

Example 1: words = ["a","aba","ababa","aa"]

j word seen Valid pairs found total_pairs
0 "a" {} None 0
1 "aba" {"a":1} "a" is prefix and suffix 1
2 "ababa" {"a":1, "aba":1} "a", "aba" valid 3
3 "aa" {"a":1, "aba":1, "ababa":1} "a" valid 4

Answer: 4

Example 2: words = ["pa","papa","ma","mama"]

j word seen Valid pairs found total_pairs
0 "pa" {} None 0
1 "papa" {"pa":1} "pa" valid 1
2 "ma" {"pa":1, "papa":1} None 1
3 "mama" {"pa":1, "papa":1, "ma":1} "ma" valid 2

Answer: 2

Complexity Analysis

Measure Complexity Explanation
Time O(n * m) Each word is checked against previous words, but prefix/suffix checks are O(1) per candidate if using hash or slice operations; total length of all words is 5 * 10^5.
Space O(n * m) Hash map stores each unique word; worst case all words are distinct.

The algorithm is efficient because the sum of lengths of all words is bounded. Direct character comparison is avoided for repeated checks.

Test Cases

# Provided examples
assert Solution().countPrefixSuffixPairs(["a","aba","ababa","aa"]) == 4
assert Solution().countPrefixSuffixPairs(["pa","papa","ma","mama"]) == 2
assert Solution().countPrefixSuffixPairs(["abab","ab"]) == 0

# Edge cases
assert Solution().countPrefixSuffixPairs(["a"]) == 0  # single element, no pairs
assert Solution().countPrefixSuffixPairs(["abc","abc","abc"]) == 3  # repeated identical strings
assert Solution().countPrefixSuffixPairs(["a","aa","aaa","aaaa"]) == 6  # cumulative prefix/suffix
assert Solution().countPrefixSuffixPairs(["ab","ba","ab"]) == 0  # no valid prefix-suffix match
Test Why
["a","aba","ababa","aa"] Verifies basic counting with multiple matches
["pa","papa","ma","mama"] Verifies independent matches
["abab","ab"] Checks when prefix longer than word fails
["a"] Single element edge case
["abc","abc","abc"] Identical strings tested for counting multiple pairs
["a","aa","aaa","aaaa"] Cumulative prefix-suffix relationships
["ab","ba","ab"] Strings with no valid matches

Edge Cases

  1. Single-word input: If words has only