LeetCode 3805 - Count Caesar Cipher Pairs

We are given an array words containing n lowercase strings. Every string has the same length m. Two strings are considered similar if we can repeatedly apply the following operation to either string: - Shift every character forward by one position in the alphabet.

LeetCode Problem 3805

Difficulty: 🟡 Medium
Topics: Array, Hash Table, Math, String, Counting

Solution

LeetCode 3805 - Count Caesar Cipher Pairs

Problem Understanding

We are given an array words containing n lowercase strings. Every string has the same length m.

Two strings are considered similar if we can repeatedly apply the following operation to either string:

  • Shift every character forward by one position in the alphabet.
  • The alphabet wraps around, so 'z' becomes 'a'.

For example:

  • "abc""bcd""cde"
  • "za""ab"

The operation affects the entire string at once. Therefore, every character in a string is shifted by the same amount.

The task is to count how many index pairs (i, j) with i < j satisfy the similarity condition.

A key observation is that two strings are similar if one can be transformed into the other by applying the same cyclic shift to every character.

The constraints are important:

  • 1 ≤ n ≤ 10^5
  • 1 ≤ m ≤ 10^5
  • n × m ≤ 10^5

The product constraint means the total number of characters across all strings is at most 100,000. This strongly suggests that an algorithm running in linear time with respect to the total input size is expected.

Several edge cases are worth noting:

  • Strings of length 1. Every single-character string is similar to every other single-character string because any letter can be shifted into any other letter.
  • Duplicate strings. Identical strings are similar with zero operations.
  • Strings containing 'z', where wraparound behavior must be handled correctly.
  • Large numbers of identical canonical patterns, where counting pairs efficiently becomes important.

Approaches

Brute Force

A straightforward approach is to examine every pair of strings.

For each pair, determine whether there exists a shift value k such that shifting every character of the first string by k produces the second string.

This can be checked by comparing the character differences at every position. If all positions require the same cyclic shift, the strings are similar.

Since there are O(n²) pairs and each comparison takes O(m) time, the total complexity becomes O(n²m).

Given that n can be as large as 100,000, this approach is far too slow.

Key Insight

Similarity depends only on the relative differences between characters inside a string.

Consider:

  • "abc"
  • "def"
  • "xyz"

All three strings have the same pattern:

  • Difference from first character: (0, 1, 2)

Any global Caesar shift changes every character equally, so these relative differences remain unchanged.

We can therefore normalize every string into a canonical representation.

One convenient normalization is:

  • Treat the first character as the reference.
  • For every position, compute

$$(\text{char} - \text{first_char} + 26) \bmod 26$$

Examples:

  • "abc"(0,1,2)
  • "def"(0,1,2)
  • "xyz"(0,1,2)
  • "za"(0,1)
  • "ab"(0,1)

Two strings are similar if and only if their normalized representations are identical.

Once every string is converted into its canonical form, the problem becomes:

Count pairs of equal canonical representations.

A hash map can track how many times each canonical form has already appeared.

Whenever a canonical form is seen again, it forms a valid pair with every previous occurrence of that same form.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n²m) O(1) Compare every pair directly
Optimal O(nm) O(nm) Normalize strings and count equal patterns with a hash map

Algorithm Walkthrough

  1. Create a hash map frequency that stores how many times each normalized pattern has appeared.
  2. Initialize answer = 0.
  3. Process each word one by one.
  4. Let the first character of the word be the reference character.
  5. Build a normalized representation by computing, for every position:

$$(\text{word}[i]-\text{word}[0]+26)\bmod 26$$

Store these values in a tuple. 6. The tuple uniquely identifies the Caesar-equivalence class of the string. 7. Before inserting the tuple into the hash map, look up how many times it has already appeared. 8. If the tuple has appeared k times, then the current word forms exactly k new valid pairs, because it is similar to each previous word with the same normalized representation. 9. Add k to the answer. 10. Increment the frequency of the tuple in the hash map. 11. After all words have been processed, return the accumulated answer.

Why it works

The normalization removes any global Caesar shift by expressing every character relative to the first character. If two strings differ only by a uniform cyclic shift, all relative differences remain identical, producing the same normalized tuple.

Conversely, if two strings have the same normalized tuple, every position differs from the first character by exactly the same amount in both strings. Therefore a single cyclic shift converts one string into the other. Thus two strings are similar if and only if their normalized representations match.

Since every pair of equal normalized representations is counted exactly once when the later occurrence is processed, the algorithm returns the correct number of similar pairs.

Python Solution

from typing import List
from collections import defaultdict

class Solution:
    def countPairs(self, words: List[str]) -> int:
        frequency = defaultdict(int)
        answer = 0

        for word in words:
            base = ord(word[0])

            pattern = tuple(
                (ord(ch) - base + 26) % 26
                for ch in word
            )

            answer += frequency[pattern]
            frequency[pattern] += 1

        return answer

The implementation follows the algorithm directly.

For each word, the first character is used as the reference point. Every character is converted into its cyclic distance from that reference character. The resulting sequence is stored as a tuple so it can be used as a hash-map key.

The hash map tracks how many times each pattern has already been seen. Before updating the count, the current frequency is added to the answer because each previous occurrence forms a valid pair with the current word.

The tuple construction touches each character exactly once, making the overall runtime proportional to the total input size.

Go Solution

func countPairs(words []string) int64 {
	frequency := make(map[string]int64)
	var answer int64

	for _, word := range words {
		base := int(word[0])

		pattern := make([]byte, len(word))

		for i := 0; i < len(word); i++ {
			diff := (int(word[i]) - base + 26) % 26
			pattern[i] = byte(diff)
		}

		key := string(pattern)

		answer += frequency[key]
		frequency[key]++
	}

	return answer
}

The Go solution uses the same normalization strategy.

Because slices cannot be used directly as map keys, a byte slice containing the normalized differences is constructed and converted into a string. This string serves as the canonical representation.

The return type is int64 because the number of valid pairs can be as large as:

$$\frac{100000 \times 99999}{2}$$

which exceeds the range that should be assumed safe for a 32-bit integer.

Worked Examples

Example 1

words = ["fusion", "layout"]

Processing "fusion"

Position Character Difference
0 f 0
1 u 15
2 s 13
3 i 3
4 o 9
5 n 8

Pattern:

(0, 15, 13, 3, 9, 8)

State:

Pattern Count
(0,15,13,3,9,8) 1

Answer = 0

Processing "layout"

Position Character Difference
0 l 0
1 a 15
2 y 13
3 o 3
4 u 9
5 t 8

Pattern:

(0, 15, 13, 3, 9, 8)

This pattern already appeared once.

Answer becomes:

0 + 1 = 1

Final answer:

1

Example 2

words = ["ab", "aa", "za", "aa"]
Word Pattern Previous Count New Answer
ab (0,1) 0 0
aa (0,0) 0 0
za (0,1) 1 1
aa (0,0) 1 2

Final answer:

2

The valid pairs are:

("ab", "za")
("aa", "aa")

Complexity Analysis

Measure Complexity Explanation
Time O(nm) Each character is processed exactly once
Space O(nm) Hash map stores canonical representations

Because n × m ≤ 100000, the total number of processed characters is at most 100000. The normalization step scans every character once, and every hash-map operation is expected O(1).

The stored canonical patterns collectively contain at most the total number of input characters, so the space complexity is also O(nm).

Test Cases

from typing import List

s = Solution()

assert s.countPairs(["fusion", "layout"]) == 1  # provided example 1
assert s.countPairs(["ab", "aa", "za", "aa"]) == 2  # provided example 2

assert s.countPairs(["a"]) == 0  # single word
assert s.countPairs(["a", "b"]) == 1  # single-character strings always similar
assert s.countPairs(["a", "b", "c"]) == 3  # all pairs valid

assert s.countPairs(["abc", "def", "xyz"]) == 3  # same pattern
assert s.countPairs(["abc", "abd"]) == 0  # different patterns

assert s.countPairs(["aa", "bb", "cc"]) == 3  # all normalize to (0,0)

assert s.countPairs(["za", "ab"]) == 1  # wraparound similarity
assert s.countPairs(["zz", "aa"]) == 1  # full wraparound

assert s.countPairs(["abc", "abc", "abc"]) == 3  # duplicates

assert s.countPairs(["ab", "bc", "cd", "de"]) == 6  # all same pattern

assert s.countPairs(["ab", "ac", "ad"]) == 0  # all distinct patterns

assert s.countPairs(["aaa", "bbb", "ccc", "ddd"]) == 6  # repeated class

assert s.countPairs(["az", "ba"]) == 1  # negative difference modulo 26

Test Summary

Test Why
["fusion","layout"] Official example
["ab","aa","za","aa"] Official example with two groups
["a"] Minimum input size
["a","b"] Single-character equivalence
["a","b","c"] All pairs valid
["abc","def","xyz"] Same normalized pattern
["abc","abd"] Different normalized patterns
["aa","bb","cc"] Constant strings
["za","ab"] Alphabet wraparound
["zz","aa"] Full cyclic shift
["abc","abc","abc"] Duplicate strings
["ab","bc","cd","de"] Large equivalence class
["ab","ac","ad"] No valid pairs
["aaa","bbb","ccc","ddd"] Many matching patterns
["az","ba"] Modular arithmetic edge case

Edge Cases

Single-Character Strings

When a string has length one, its normalized representation is always (0). Every single-character string is therefore similar to every other single-character string. A common mistake is to compare actual characters instead of normalized forms. The normalization approach automatically groups all such strings together.

Wraparound at 'z'

Strings such as "za" and "ab" are similar because shifting "za" once produces "ab". Implementations that use ordinary subtraction without modular arithmetic may incorrectly conclude that these strings are different. The formula (difference + 26) % 26 correctly handles wraparound cases.

Large Numbers of Equivalent Strings

Suppose all strings belong to the same equivalence class. A naive algorithm would compare every pair, leading to quadratic behavior. The hash-map counting approach accumulates pair counts incrementally, allowing all such cases to be handled in linear time relative to the total input size.

Duplicate Strings

Identical strings require zero operations and are therefore similar. The normalization process maps identical strings to the same canonical representation, and the frequency map correctly counts all duplicate pairs using the standard combination-counting pattern.