LeetCode 244 - Shortest Word Distance II
This problem asks us to design a reusable data structure that can efficiently answer repeated shortest-distance queries between words in a fixed list of strings. We are given an array of words, wordsDict, during initialization.
Difficulty: 🟡 Medium
Topics: Array, Hash Table, Two Pointers, String, Design
Solution
LeetCode 244 - Shortest Word Distance II
Problem Understanding
This problem asks us to design a reusable data structure that can efficiently answer repeated shortest-distance queries between words in a fixed list of strings.
We are given an array of words, wordsDict, during initialization. After that, we must repeatedly answer queries asking:
What is the minimum index distance between two different words in the original array?
The distance between two words is measured using the absolute difference between their positions in the array.
For example, consider:
["practice", "makes", "perfect", "coding", "makes"]
If we query:
shortest("coding", "practice")
The positions are:
"coding"→ index3"practice"→ index0
Distance:
|3 - 0| = 3
If we query:
shortest("makes", "coding")
The word "makes" appears multiple times:
index 1, index 4
The word "coding" appears at:
index 3
Possible distances:
|1 - 3| = 2
|4 - 3| = 1
The shortest distance is 1.
The important observation is that the dictionary never changes after initialization. We are allowed to preprocess the input once and answer many future queries. This strongly suggests a preprocessing-based solution.
The constraints provide additional hints:
wordsDict.length <= 3 * 10^4- At most
5000calls toshortest word1andword2are guaranteed to existword1 != word2
A naive approach that scans the entire array for every query may still work within limits, but it is inefficient and ignores the reusable nature of the data structure. Since we may receive thousands of queries, preprocessing becomes worthwhile.
There are also several important guarantees that simplify implementation. We never need to handle missing words because the problem guarantees both query words exist in wordsDict. We also never need to handle identical words because word1 != word2.
However, repeated occurrences of words are a key challenge. A naive implementation may accidentally compare only the first occurrence and miss the actual shortest distance.
Approaches
Brute Force Approach
The most straightforward solution is to scan the entire wordsDict array every time shortest() is called.
For each query, we maintain the most recent positions of word1 and word2 while iterating through the array. Whenever we encounter either word, we update its index and compute the current distance if both words have already been seen.
This approach is correct because it examines every occurrence of both words and continuously updates the minimum distance.
However, this becomes inefficient when multiple queries are made. Each query requires traversing the entire array of length n.
If we receive q queries:
Total work = O(n × q)
With up to 5000 queries and 30000 words, this repeated scanning becomes wasteful.
Optimal Approach
The key insight is that the input array never changes.
Instead of rescanning the array for every query, we can preprocess the positions of every word once.
We build a hash map:
word -> list of indices
Example:
["practice", "makes", "perfect", "coding", "makes"]
Becomes:
{
"practice": [0],
"makes": [1, 4],
"perfect": [2],
"coding": [3]
}
Now, for a query like:
shortest("makes", "coding")
We only compare:
[1, 4]
and
[3]
Since both lists are naturally sorted by index, we can use the two-pointer technique to efficiently find the minimum absolute difference.
This works similarly to merging two sorted arrays.
At every step:
- Compare the current positions
- Update the minimum distance
- Move the pointer with the smaller index
This avoids unnecessary comparisons and guarantees we eventually examine the best possible pair.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n) per query | O(1) | Scan the entire array for every query |
| Optimal | O(n) preprocessing + O(k + m) per query | O(n) | Store indices and use two pointers |
Where:
n= total number of wordsk= occurrences ofword1m= occurrences ofword2
Algorithm Walkthrough
Optimal Algorithm
- During initialization, create a hash map where each word maps to a list of all indices where it appears.
For example:
"makes" -> [1, 4]
We use a hash map because it provides constant-time lookup of the positions for any word. 2. Traverse the input array once.
For every index i and word wordsDict[i], append i to the corresponding list in the hash map.
3. When shortest(word1, word2) is called, retrieve the two sorted index lists from the hash map.
Example:
word1 = "makes"
word2 = "coding"
[1, 4]
[3]
- Initialize two pointers:
i = 0
j = 0
These pointers track our current positions in the two lists. 5. Compare the indices at both pointers.
Compute:
abs(list1[i] - list2[j])
Update the minimum distance. 6. Move the pointer pointing to the smaller index.
If:
list1[i] < list2[j]
increment i.
Otherwise, increment j.
This step is important because moving the larger value cannot reduce the difference. To potentially find a smaller distance, we must advance the smaller index. 7. Continue until one pointer reaches the end of its list. 8. Return the smallest distance found.
Why it works
The correctness relies on the fact that both index lists are sorted.
At every step, the smaller index is the limiting factor. If we keep the smaller index fixed and move the larger one forward, the difference can only stay the same or increase. Therefore, the only chance of finding a smaller distance is to move the smaller pointer forward.
This greedy property guarantees that no optimal pair is skipped, and the algorithm always finds the true minimum distance.
Python Solution
from collections import defaultdict
from typing import List
class WordDistance:
def __init__(self, wordsDict: List[str]):
self.word_positions = defaultdict(list)
for index, word in enumerate(wordsDict):
self.word_positions[word].append(index)
def shortest(self, word1: str, word2: str) -> int:
positions1 = self.word_positions[word1]
positions2 = self.word_positions[word2]
pointer1 = 0
pointer2 = 0
minimum_distance = float("inf")
while pointer1 < len(positions1) and pointer2 < len(positions2):
index1 = positions1[pointer1]
index2 = positions2[pointer2]
minimum_distance = min(
minimum_distance,
abs(index1 - index2)
)
if index1 < index2:
pointer1 += 1
else:
pointer2 += 1
return minimum_distance
# Your WordDistance object will be instantiated and called as such:
# obj = WordDistance(wordsDict)
# param_1 = obj.shortest(word1, word2)
The implementation begins by preprocessing the dictionary during initialization. A defaultdict(list) stores every occurrence index for each word. Because we scan the array from left to right, indices are automatically stored in sorted order.
Inside shortest(), we retrieve the two index lists corresponding to the queried words.
We then use two pointers to traverse both sorted lists simultaneously. At each step, we calculate the current distance and update the minimum.
The pointer movement rule is critical. We advance whichever pointer references the smaller index because doing so is the only way to potentially reduce the difference.
Once either list is exhausted, every relevant comparison has been considered, and we return the minimum distance.
Go Solution
type WordDistance struct {
wordPositions map[string][]int
}
func Constructor(wordsDict []string) WordDistance {
wordPositions := make(map[string][]int)
for index, word := range wordsDict {
wordPositions[word] = append(
wordPositions[word],
index,
)
}
return WordDistance{
wordPositions: wordPositions,
}
}
func (this *WordDistance) Shortest(word1 string, word2 string) int {
positions1 := this.wordPositions[word1]
positions2 := this.wordPositions[word2]
pointer1 := 0
pointer2 := 0
minimumDistance := int(^uint(0) >> 1)
for pointer1 < len(positions1) &&
pointer2 < len(positions2) {
index1 := positions1[pointer1]
index2 := positions2[pointer2]
distance := index1 - index2
if distance < 0 {
distance = -distance
}
if distance < minimumDistance {
minimumDistance = distance
}
if index1 < index2 {
pointer1++
} else {
pointer2++
}
}
return minimumDistance
}
/**
* Your WordDistance object will be instantiated and called as such:
* obj := Constructor(wordsDict);
* param_1 := obj.Shortest(word1, word2);
*/
The Go implementation follows the same logic as the Python version. Instead of a defaultdict, we use a map[string][]int.
Because Go does not have a built-in infinity value for integers, we initialize minimumDistance using the maximum possible integer:
int(^uint(0) >> 1)
We also manually compute absolute value since Go lacks a built-in integer abs() function.
Slices in Go naturally grow using append(), making them ideal for storing word positions.
Worked Examples
Example 1
Input:
["practice", "makes", "perfect", "coding", "makes"]
Preprocessed map:
| Word | Indices |
|---|---|
| practice | [0] |
| makes | [1, 4] |
| perfect | [2] |
| coding | [3] |
Query 1
shortest("coding", "practice")
Retrieved lists:
coding -> [3]
practice -> [0]
| Step | Pointer1 | Pointer2 | Compared | Distance | Minimum |
|---|---|---|---|---|---|
| 1 | 0 | 0 | (3, 0) | 3 | 3 |
Pointer for smaller value (0) moves.
Loop ends.
Result:
3
Query 2
shortest("makes", "coding")
Retrieved lists:
makes -> [1, 4]
coding -> [3]
| Step | Pointer1 | Pointer2 | Compared | Distance | Minimum |
|---|---|---|---|---|---|
| 1 | 0 | 0 | (1, 3) | 2 | 2 |
| 2 | 1 | 0 | (4, 3) | 1 | 1 |
The smaller pointer moves each time.
Final result:
1
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n) preprocessing + O(k + m) per query | Build map once, then traverse both occurrence lists |
| Space | O(n) | Store every word index |
The preprocessing step scans the input array exactly once, requiring O(n) time.
For each query, the two-pointer algorithm traverses each occurrence list at most once. If word1 appears k times and word2 appears m times, the runtime is:
O(k + m)
The space complexity is O(n) because every index from the original array is stored exactly once in the hash map.
Test Cases
# Example case
obj = WordDistance(
["practice", "makes", "perfect", "coding", "makes"]
)
assert obj.shortest("coding", "practice") == 3 # Provided example
assert obj.shortest("makes", "coding") == 1 # Multiple occurrences
# Minimum input size
obj = WordDistance(["a", "b"])
assert obj.shortest("a", "b") == 1 # Smallest valid array
# Words appearing multiple times
obj = WordDistance(
["a", "x", "a", "x", "a"]
)
assert obj.shortest("a", "x") == 1 # Closest adjacent pair
# Words far apart
obj = WordDistance(
["a", "b", "c", "d", "e"]
)
assert obj.shortest("a", "e") == 4 # Maximum distance
# One word appears once
obj = WordDistance(
["apple", "banana", "apple", "orange"]
)
assert obj.shortest("banana", "orange") == 2 # Single occurrences
# Many repeated words
obj = WordDistance(
["a", "b", "a", "b", "a", "b"]
)
assert obj.shortest("a", "b") == 1 # Repeated alternating words
# Adjacent words
obj = WordDistance(
["cat", "dog", "mouse"]
)
assert obj.shortest("cat", "dog") == 1 # Immediate neighbors
# Different occurrence distributions
obj = WordDistance(
["x", "x", "x", "y", "z", "y"]
)
assert obj.shortest("x", "y") == 1 # Closest occurrence near boundary
| Test | Why |
|---|---|
| Provided example | Verifies expected output |
["a", "b"] |
Tests smallest valid input |
| Multiple occurrences | Ensures closest pair is chosen |
| Far apart words | Verifies large distance handling |
| Single occurrence words | Tests simplest comparison |
| Alternating repeats | Stresses repeated occurrences |
| Adjacent words | Tests minimum possible distance |
| Uneven distributions | Ensures pointer logic works correctly |
Edge Cases
Words Appearing Multiple Times
A major source of bugs is assuming each word appears only once.
Example:
["a", "x", "a", "x", "a"]
If we only compare the first occurrences:
|0 - 1| = 1
we may accidentally miss shorter combinations in other inputs. The implementation avoids this problem by storing all positions and using a two-pointer scan over every occurrence.
Very Frequent Queries
Since the problem allows up to 5000 calls to shortest(), repeatedly scanning the entire array would be inefficient.
A naive implementation would repeatedly perform O(n) work. Our preprocessing approach pays the setup cost once during initialization, then answers queries efficiently using precomputed index lists.
One Word Appearing Only Once
Some words may occur exactly once.
Example:
["apple", "banana", "orange"]
When querying:
shortest("banana", "orange")
each list contains only one element.
The two-pointer logic still works naturally because it handles single-element lists without requiring any special cases.
Closest Match at Different Positions
The shortest distance is not always formed by the first occurrence.
Example:
["a", "b", "c", "a", "b"]
For:
shortest("a", "b")
The best answer is:
|3 - 4| = 1
not:
|0 - 1| = 1
In more complicated examples, later occurrences may be strictly better. The two-pointer traversal guarantees every promising candidate pair is considered while avoiding unnecessary comparisons.