LeetCode 527 - Word Abbreviation
The problem asks us to compute minimal unique abbreviations for an array of distinct strings. An abbreviation replaces the middle characters of a word with a count, keeping the first and last characters.
Difficulty: 🔴 Hard
Topics: Array, String, Greedy, Trie, Sorting
Solution
Problem Understanding
The problem asks us to compute minimal unique abbreviations for an array of distinct strings. An abbreviation replaces the middle characters of a word with a count, keeping the first and last characters. For example, "internal" becomes "i6l" because there are six letters between i and l.
The challenge arises when multiple words share the same abbreviation. In that case, we must increase the prefix length (the number of starting characters included before the count) until all abbreviations are unique. For example, "abcdef" and "abndef" initially abbreviate to "a4f". Incrementing the prefix gives "ab3f" and then "abc2f" / "abn2f" until uniqueness is achieved.
The input is an array of unique strings, each of length 2 to 400, with up to 400 strings. The output is an array of abbreviations such that every abbreviation is minimal, unique, and shorter than the original word (otherwise we retain the original word).
Edge cases to be aware of include very short words (like "aa"), words that differ only at the last character, and words whose abbreviation would not reduce their length.
Approaches
Brute-Force Approach
A naive approach is to start with the standard abbreviation for each word. Then, repeatedly check all pairs of words for collisions. Whenever two words share an abbreviation, increment the prefix of the conflicting words and recompute the abbreviation. Continue until all abbreviations are unique.
This method works because it directly follows the problem rules, but it is inefficient. Checking all pairs and repeatedly updating abbreviations could take O(n² * L) time, where n is the number of words and L is the maximum word length, which is too slow for n up to 400 and L up to 400.
Optimal Approach
The key observation is that words can be grouped by length and by current abbreviation, and then we can resolve conflicts within each group rather than checking all pairs globally. Using a trie-like prefix resolution or a hash map of conflicts, we can efficiently determine the minimum prefix length required for each word to ensure uniqueness.
For each group of words that initially share the same abbreviation, we increment their prefix lengths only as much as needed to disambiguate them. This avoids unnecessary repeated work and ensures that every word has the minimal abbreviation.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n² * L) | O(n) | Compare all pairs, increment prefixes iteratively |
| Optimal | O(n * L) | O(n * L) | Group words, increment prefix intelligently until conflicts are resolved |
Algorithm Walkthrough
- Initial Abbreviations: For each word, generate the first abbreviation using the formula: first letter + count of middle letters + last letter.
- Grouping: Group words by
(length, last character)and current abbreviation. This ensures we only check conflicts among words that could actually collide. - Prefix Resolution: For each group with conflicts, determine the minimum prefix length for each word that makes the abbreviation unique. Increase prefixes iteratively but only within conflicting sets.
- Abbreviation Finalization: For each word, compute the final abbreviation using its resolved prefix length. If the abbreviation is not shorter than the original word, keep the original.
- Return: Assemble the list of abbreviations in the same order as the input.
Why it works: By focusing on conflicting groups and only incrementing the prefix as needed, we guarantee both minimality and uniqueness. Since conflicts are resolved locally within groups, we avoid unnecessary comparisons. The invariant is that after processing, no two words share the same abbreviation.
Python Solution
from typing import List, Dict, Tuple
from collections import defaultdict
class Solution:
def wordsAbbreviation(self, words: List[str]) -> List[str]:
n = len(words)
prefix_lengths = [1] * n
def abbreviate(word: str, prefix_len: int) -> str:
if prefix_len >= len(word) - 2:
return word
return word[:prefix_len] + str(len(word) - prefix_len - 1) + word[-1]
# Initialize abbreviations
abbr_map: Dict[str, List[int]] = defaultdict(list)
for i, word in enumerate(words):
abbr = abbreviate(word, 1)
abbr_map[abbr].append(i)
# Resolve conflicts
while True:
new_map = defaultdict(list)
resolved = True
for abbr, indices in abbr_map.items():
if len(indices) == 1:
new_map[abbr].append(indices[0])
else:
resolved = False
for idx in indices:
prefix_lengths[idx] += 1
new_abbr = abbreviate(words[idx], prefix_lengths[idx])
new_map[new_abbr].append(idx)
abbr_map = new_map
if resolved:
break
# Build final abbreviations
result = []
for i, word in enumerate(words):
abbr = abbreviate(word, prefix_lengths[i])
result.append(abbr)
return result
Implementation Walkthrough:
We start by initializing each word's prefix length to 1. The abbreviate function generates the abbreviation based on a given prefix length. Then, we map abbreviations to their corresponding word indices. In the conflict resolution loop, we detect any abbreviations used by multiple words and increment their prefix lengths until all abbreviations are unique. Finally, we compute and return the final abbreviations.
Go Solution
package main
func wordsAbbreviation(words []string) []string {
n := len(words)
prefixLens := make([]int, n)
for i := 0; i < n; i++ {
prefixLens[i] = 1
}
abbreviate := func(word string, prefixLen int) string {
if prefixLen >= len(word)-2 {
return word
}
return word[:prefixLen] + string(rune(len(word)-prefixLen-1+'0')) + word[len(word)-1:]
}
abbrMap := map[string][]int{}
for i, word := range words {
abbr := abbreviate(word, 1)
abbrMap[abbr] = append(abbrMap[abbr], i)
}
for {
newMap := map[string][]int{}
resolved := true
for abbr, indices := range abbrMap {
if len(indices) == 1 {
newMap[abbr] = append(newMap[abbr], indices[0])
} else {
resolved = false
for _, idx := range indices {
prefixLens[idx]++
newAbbr := abbreviate(words[idx], prefixLens[idx])
newMap[newAbbr] = append(newMap[newAbbr], idx)
}
}
}
abbrMap = newMap
if resolved {
break
}
}
result := make([]string, n)
for i, word := range words {
result[i] = abbreviate(word, prefixLens[i])
}
return result
}
Go-specific Notes:
The main differences include using slices and maps, handling string slicing, and converting integer counts to strings. The logic mirrors Python but uses Go syntax and data structures. Edge cases such as very short words are handled by returning the original word if the abbreviation would not shorten it.
Worked Examples
Example 1
Input: ["like","god","internal","me","internet","interval","intension","face","intrusion"]
| Word | Initial Abbr | Conflict? | Final Abbr |
|---|---|---|---|
| like | l2e | No | l2e |
| god | g1d | No | god |
| internal | i6l | No | internal |
| me | m0e | No | me |
| internet | i6t | Yes | i6t |
| interval | i6l | No | interval |
| intension | i7n | Yes | inte4n |
| face | f2e | No | f2e |
| intrusion | i7n | Yes | intr4n |
Example 2
Input: ["aa","aaa"]
| Word | Initial Abbr | Conflict? | Final Abbr |
|---|---|---|---|
| aa | a0a | Yes | aa |
| aaa | a1a | No | aaa |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * L) | Each word's prefix increases at most L times, and we process n words, L is max word length |
| Space | O(n * L) | Store abbreviations and prefix lengths for all words |
The main work is incrementing prefixes to resolve conflicts, which is bounded by word length. Grouping by abbreviation ensures we do not repeatedly check all pairs.
Test Cases
# test cases
assert Solution().wordsAbbreviation(["like","god","internal","me","internet","interval","intension","face","intrusion"]) == ["l2e","god","internal","me","i6t","interval","inte4n","f2e