LeetCode 3805 - Count Caesar Cipher Pairs
We are given an array words containing n lowercase strings. Every string has the same length m. Two strings are considered similar if we can repeatedly apply the following operation to either string: - Shift every character forward by one position in the alphabet.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, Math, String, Counting
Solution
LeetCode 3805 - Count Caesar Cipher Pairs
Problem Understanding
We are given an array words containing n lowercase strings. Every string has the same length m.
Two strings are considered similar if we can repeatedly apply the following operation to either string:
- Shift every character forward by one position in the alphabet.
- The alphabet wraps around, so
'z'becomes'a'.
For example:
"abc"→"bcd"→"cde""za"→"ab"
The operation affects the entire string at once. Therefore, every character in a string is shifted by the same amount.
The task is to count how many index pairs (i, j) with i < j satisfy the similarity condition.
A key observation is that two strings are similar if one can be transformed into the other by applying the same cyclic shift to every character.
The constraints are important:
1 ≤ n ≤ 10^51 ≤ m ≤ 10^5n × m ≤ 10^5
The product constraint means the total number of characters across all strings is at most 100,000. This strongly suggests that an algorithm running in linear time with respect to the total input size is expected.
Several edge cases are worth noting:
- Strings of length
1. Every single-character string is similar to every other single-character string because any letter can be shifted into any other letter. - Duplicate strings. Identical strings are similar with zero operations.
- Strings containing
'z', where wraparound behavior must be handled correctly. - Large numbers of identical canonical patterns, where counting pairs efficiently becomes important.
Approaches
Brute Force
A straightforward approach is to examine every pair of strings.
For each pair, determine whether there exists a shift value k such that shifting every character of the first string by k produces the second string.
This can be checked by comparing the character differences at every position. If all positions require the same cyclic shift, the strings are similar.
Since there are O(n²) pairs and each comparison takes O(m) time, the total complexity becomes O(n²m).
Given that n can be as large as 100,000, this approach is far too slow.
Key Insight
Similarity depends only on the relative differences between characters inside a string.
Consider:
"abc""def""xyz"
All three strings have the same pattern:
- Difference from first character:
(0, 1, 2)
Any global Caesar shift changes every character equally, so these relative differences remain unchanged.
We can therefore normalize every string into a canonical representation.
One convenient normalization is:
- Treat the first character as the reference.
- For every position, compute
$$(\text{char} - \text{first_char} + 26) \bmod 26$$
Examples:
"abc"→(0,1,2)"def"→(0,1,2)"xyz"→(0,1,2)"za"→(0,1)"ab"→(0,1)
Two strings are similar if and only if their normalized representations are identical.
Once every string is converted into its canonical form, the problem becomes:
Count pairs of equal canonical representations.
A hash map can track how many times each canonical form has already appeared.
Whenever a canonical form is seen again, it forms a valid pair with every previous occurrence of that same form.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n²m) | O(1) | Compare every pair directly |
| Optimal | O(nm) | O(nm) | Normalize strings and count equal patterns with a hash map |
Algorithm Walkthrough
- Create a hash map
frequencythat stores how many times each normalized pattern has appeared. - Initialize
answer = 0. - Process each word one by one.
- Let the first character of the word be the reference character.
- Build a normalized representation by computing, for every position:
$$(\text{word}[i]-\text{word}[0]+26)\bmod 26$$
Store these values in a tuple.
6. The tuple uniquely identifies the Caesar-equivalence class of the string.
7. Before inserting the tuple into the hash map, look up how many times it has already appeared.
8. If the tuple has appeared k times, then the current word forms exactly k new valid pairs, because it is similar to each previous word with the same normalized representation.
9. Add k to the answer.
10. Increment the frequency of the tuple in the hash map.
11. After all words have been processed, return the accumulated answer.
Why it works
The normalization removes any global Caesar shift by expressing every character relative to the first character. If two strings differ only by a uniform cyclic shift, all relative differences remain identical, producing the same normalized tuple.
Conversely, if two strings have the same normalized tuple, every position differs from the first character by exactly the same amount in both strings. Therefore a single cyclic shift converts one string into the other. Thus two strings are similar if and only if their normalized representations match.
Since every pair of equal normalized representations is counted exactly once when the later occurrence is processed, the algorithm returns the correct number of similar pairs.
Python Solution
from typing import List
from collections import defaultdict
class Solution:
def countPairs(self, words: List[str]) -> int:
frequency = defaultdict(int)
answer = 0
for word in words:
base = ord(word[0])
pattern = tuple(
(ord(ch) - base + 26) % 26
for ch in word
)
answer += frequency[pattern]
frequency[pattern] += 1
return answer
The implementation follows the algorithm directly.
For each word, the first character is used as the reference point. Every character is converted into its cyclic distance from that reference character. The resulting sequence is stored as a tuple so it can be used as a hash-map key.
The hash map tracks how many times each pattern has already been seen. Before updating the count, the current frequency is added to the answer because each previous occurrence forms a valid pair with the current word.
The tuple construction touches each character exactly once, making the overall runtime proportional to the total input size.
Go Solution
func countPairs(words []string) int64 {
frequency := make(map[string]int64)
var answer int64
for _, word := range words {
base := int(word[0])
pattern := make([]byte, len(word))
for i := 0; i < len(word); i++ {
diff := (int(word[i]) - base + 26) % 26
pattern[i] = byte(diff)
}
key := string(pattern)
answer += frequency[key]
frequency[key]++
}
return answer
}
The Go solution uses the same normalization strategy.
Because slices cannot be used directly as map keys, a byte slice containing the normalized differences is constructed and converted into a string. This string serves as the canonical representation.
The return type is int64 because the number of valid pairs can be as large as:
$$\frac{100000 \times 99999}{2}$$
which exceeds the range that should be assumed safe for a 32-bit integer.
Worked Examples
Example 1
words = ["fusion", "layout"]
Processing "fusion"
| Position | Character | Difference |
|---|---|---|
| 0 | f | 0 |
| 1 | u | 15 |
| 2 | s | 13 |
| 3 | i | 3 |
| 4 | o | 9 |
| 5 | n | 8 |
Pattern:
(0, 15, 13, 3, 9, 8)
State:
| Pattern | Count |
|---|---|
| (0,15,13,3,9,8) | 1 |
Answer = 0
Processing "layout"
| Position | Character | Difference |
|---|---|---|
| 0 | l | 0 |
| 1 | a | 15 |
| 2 | y | 13 |
| 3 | o | 3 |
| 4 | u | 9 |
| 5 | t | 8 |
Pattern:
(0, 15, 13, 3, 9, 8)
This pattern already appeared once.
Answer becomes:
0 + 1 = 1
Final answer:
1
Example 2
words = ["ab", "aa", "za", "aa"]
| Word | Pattern | Previous Count | New Answer |
|---|---|---|---|
| ab | (0,1) | 0 | 0 |
| aa | (0,0) | 0 | 0 |
| za | (0,1) | 1 | 1 |
| aa | (0,0) | 1 | 2 |
Final answer:
2
The valid pairs are:
("ab", "za")
("aa", "aa")
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(nm) | Each character is processed exactly once |
| Space | O(nm) | Hash map stores canonical representations |
Because n × m ≤ 100000, the total number of processed characters is at most 100000. The normalization step scans every character once, and every hash-map operation is expected O(1).
The stored canonical patterns collectively contain at most the total number of input characters, so the space complexity is also O(nm).
Test Cases
from typing import List
s = Solution()
assert s.countPairs(["fusion", "layout"]) == 1 # provided example 1
assert s.countPairs(["ab", "aa", "za", "aa"]) == 2 # provided example 2
assert s.countPairs(["a"]) == 0 # single word
assert s.countPairs(["a", "b"]) == 1 # single-character strings always similar
assert s.countPairs(["a", "b", "c"]) == 3 # all pairs valid
assert s.countPairs(["abc", "def", "xyz"]) == 3 # same pattern
assert s.countPairs(["abc", "abd"]) == 0 # different patterns
assert s.countPairs(["aa", "bb", "cc"]) == 3 # all normalize to (0,0)
assert s.countPairs(["za", "ab"]) == 1 # wraparound similarity
assert s.countPairs(["zz", "aa"]) == 1 # full wraparound
assert s.countPairs(["abc", "abc", "abc"]) == 3 # duplicates
assert s.countPairs(["ab", "bc", "cd", "de"]) == 6 # all same pattern
assert s.countPairs(["ab", "ac", "ad"]) == 0 # all distinct patterns
assert s.countPairs(["aaa", "bbb", "ccc", "ddd"]) == 6 # repeated class
assert s.countPairs(["az", "ba"]) == 1 # negative difference modulo 26
Test Summary
| Test | Why |
|---|---|
["fusion","layout"] |
Official example |
["ab","aa","za","aa"] |
Official example with two groups |
["a"] |
Minimum input size |
["a","b"] |
Single-character equivalence |
["a","b","c"] |
All pairs valid |
["abc","def","xyz"] |
Same normalized pattern |
["abc","abd"] |
Different normalized patterns |
["aa","bb","cc"] |
Constant strings |
["za","ab"] |
Alphabet wraparound |
["zz","aa"] |
Full cyclic shift |
["abc","abc","abc"] |
Duplicate strings |
["ab","bc","cd","de"] |
Large equivalence class |
["ab","ac","ad"] |
No valid pairs |
["aaa","bbb","ccc","ddd"] |
Many matching patterns |
["az","ba"] |
Modular arithmetic edge case |
Edge Cases
Single-Character Strings
When a string has length one, its normalized representation is always (0). Every single-character string is therefore similar to every other single-character string. A common mistake is to compare actual characters instead of normalized forms. The normalization approach automatically groups all such strings together.
Wraparound at 'z'
Strings such as "za" and "ab" are similar because shifting "za" once produces "ab". Implementations that use ordinary subtraction without modular arithmetic may incorrectly conclude that these strings are different. The formula (difference + 26) % 26 correctly handles wraparound cases.
Large Numbers of Equivalent Strings
Suppose all strings belong to the same equivalence class. A naive algorithm would compare every pair, leading to quadratic behavior. The hash-map counting approach accumulates pair counts incrementally, allowing all such cases to be handled in linear time relative to the total input size.
Duplicate Strings
Identical strings require zero operations and are therefore similar. The normalization process maps identical strings to the same canonical representation, and the frequency map correctly counts all duplicate pairs using the standard combination-counting pattern.