LeetCode 1858 - Longest Word With All Prefixes
The problem gives us an array of lowercase strings called words. We need to find the longest word such that every prefix of that word also exists in the array. A prefix means the string formed by taking characters from the beginning of the word.
Difficulty: 🟡 Medium
Topics: Array, String, Depth-First Search, Trie
Solution
Problem Understanding
The problem gives us an array of lowercase strings called words. We need to find the longest word such that every prefix of that word also exists in the array.
A prefix means the string formed by taking characters from the beginning of the word. For example, for "apple":
- Length 1 prefix:
"a" - Length 2 prefix:
"ap" - Length 3 prefix:
"app" - Length 4 prefix:
"appl" - Length 5 prefix:
"apple"
For "apple" to qualify, all shorter prefixes must also appear in the input array. If even one prefix is missing, the word is invalid.
The output should be:
- The longest valid word
- If multiple valid words have the same maximum length, choose the lexicographically smallest one
- If no valid word exists, return an empty string
The constraints are important:
- Up to
10^5words - Total length of all words is at most
10^5
This total length constraint tells us that solutions proportional to the total number of characters are acceptable. However, quadratic approaches over all strings or repeated expensive substring searches could become too slow.
There are several important edge cases:
- A word may exist without its smaller prefixes. For example,
"banana"alone is invalid because"b","ba", and so on are missing. - Multiple words can have the same maximum length, requiring lexicographical comparison.
- A single character word is automatically valid because it has no shorter non-empty prefixes.
- There may be no valid chain at all, in which case we return
"".
Approaches
Brute Force Approach
A straightforward solution is to examine every word independently.
For each word:
- Generate all prefixes of lengths
1throughlen(word) - 1 - Check whether each prefix exists in the input array
- If all prefixes exist, the word is valid
- Track the best answer according to:
- Longer length first
- Lexicographically smaller if lengths tie
To make prefix lookup efficient, we store all words in a hash set.
This approach is correct because it explicitly verifies the required property for every candidate word. However, it repeatedly constructs prefixes for every word, which can create unnecessary overhead.
Even though the total character count is bounded by 10^5, we can still improve the solution structure and make the logic cleaner.
Key Insight
A word is valid if and only if its immediate shorter prefix is valid.
For example:
"appl"is valid only if"app"is valid"app"is valid only if"ap"is valid"ap"is valid only if"a"is valid
This creates a natural incremental construction process.
If we process words in sorted order by length, we can build a set of already validated words. Then:
- A one-character word is automatically valid
- A longer word is valid if its prefix
word[:-1]is already valid
This avoids repeatedly checking every smaller prefix from scratch.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(total_chars) | O(total_chars) | Checks every prefix for every word individually |
| Optimal | O(total_chars log n) | O(total_chars) | Builds valid words incrementally using a hash set |
Algorithm Walkthrough
Optimal Algorithm
- Store all words in a sortable list and sort them lexicographically.
Lexicographical sorting is important because when two valid words have the same length, we want the smaller one. Sorting first ensures that earlier words naturally satisfy the tie-breaking rule. 2. Sort words by length ascending, then lexicographically.
This guarantees that before processing a word, all shorter candidate prefixes have already been processed.
3. Create a hash set called valid_words.
This set will contain only words whose prefixes are all valid.
4. Initialize an empty string best_answer.
This variable tracks the longest valid word found so far. 5. Iterate through every word.
For each word:
- If its length is
1, it is automatically valid - Otherwise, check whether
word[:-1]exists invalid_words
- If the word is valid:
-
Add it to
valid_words -
Compare it against
best_answer -
Update the answer if:
-
The word is longer, or
-
The lengths are equal and the word is lexicographically smaller
- Return
best_answerafter processing all words.
Why it works
The algorithm maintains the invariant that valid_words contains exactly the words whose prefixes all exist in the array.
When processing a word, we only need to check its immediate shorter prefix because that prefix itself could only have entered valid_words if all of its prefixes were valid. This creates an inductive chain of correctness.
Since words are processed from shorter to longer lengths, every necessary prefix has already been evaluated before the current word.
Python Solution
from typing import List
class Solution:
def longestWord(self, words: List[str]) -> str:
words.sort(key=lambda word: (len(word), word))
valid_words = set()
best_answer = ""
for word in words:
if len(word) == 1 or word[:-1] in valid_words:
valid_words.add(word)
if (
len(word) > len(best_answer)
or (
len(word) == len(best_answer)
and word < best_answer
)
):
best_answer = word
return best_answer
The implementation begins by sorting words according to two criteria:
- Shorter words first
- Lexicographically smaller words first among equal lengths
This ordering guarantees that all possible prefixes are processed before their longer extensions.
The valid_words set stores only words that satisfy the prefix condition. A word becomes valid in two situations:
- It has length
1 - Its immediate prefix already belongs to
valid_words
The expression word[:-1] removes the last character and gives the required shorter prefix.
Whenever a valid word is found, the algorithm compares it against the current best answer. The comparison logic follows the problem rules exactly:
- Prefer longer words
- If lengths tie, prefer lexicographically smaller words
Finally, the method returns the best valid word discovered during traversal.
Go Solution
package main
import (
"sort"
)
func longestWord(words []string) string {
sort.Slice(words, func(i, j int) bool {
if len(words[i]) == len(words[j]) {
return words[i] < words[j]
}
return len(words[i]) < len(words[j])
})
validWords := make(map[string]bool)
bestAnswer := ""
for _, word := range words {
if len(word) == 1 || validWords[word[:len(word)-1]] {
validWords[word] = true
if len(word) > len(bestAnswer) ||
(len(word) == len(bestAnswer) && word < bestAnswer) {
bestAnswer = word
}
}
}
return bestAnswer
}
The Go solution follows the same logic as the Python implementation.
A custom comparator is used with sort.Slice to sort by length first and lexicographical order second.
Go does not have a built-in set type, so a map[string]bool is used instead. Presence in the map indicates that a word is valid.
Substring extraction uses slicing syntax:
word[:len(word)-1]
This produces the immediate prefix needed for validation.
Unlike Python, Go distinguishes between nil maps and initialized maps, so make(map[string]bool) is required before insertion.
Worked Examples
Example 1
Input: ["k","ki","kir","kira","kiran"]
After sorting:
| Word |
|---|
| k |
| ki |
| kir |
| kira |
| kiran |
Processing steps:
| Current Word | Required Prefix | Prefix Exists? | Valid Set After Step | Best Answer |
|---|---|---|---|---|
| k | none | yes | {k} | k |
| ki | k | yes | {k, ki} | ki |
| kir | ki | yes | {k, ki, kir} | kir |
| kira | kir | yes | {k, ki, kir, kira} | kira |
| kiran | kira | yes | {k, ki, kir, kira, kiran} | kiran |
Final answer:
"kiran"
Example 2
Input: ["a", "banana", "app", "appl", "ap", "apply", "apple"]
After sorting:
| Word |
|---|
| a |
| ap |
| app |
| appl |
| apple |
| apply |
| banana |
Processing:
| Current Word | Required Prefix | Prefix Exists? | Best Answer |
|---|---|---|---|
| a | none | yes | a |
| ap | a | yes | ap |
| app | ap | yes | app |
| appl | app | yes | appl |
| apple | appl | yes | apple |
| apply | appl | yes | apple |
| banana | banan | no | apple |
Both "apple" and "apply" are valid and length 5.
Since "apple" is lexicographically smaller:
"apple"
is returned.
Example 3
Input: ["abc", "bc", "ab", "qwe"]
After sorting:
| Word |
|---|
| ab |
| bc |
| abc |
| qwe |
Processing:
| Current Word | Required Prefix | Prefix Exists? | Valid? |
|---|---|---|---|
| ab | a | no | no |
| bc | b | no | no |
| abc | ab | no | no |
| qwe | qw | no | no |
No valid words exist.
Final answer:
""
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n log n + total_chars) | Sorting dominates, prefix checks are constant time |
| Space | O(total_chars) | Hash set stores valid words |
The sorting step requires O(n log n) comparisons. Since the total character count is bounded by 10^5, this is efficient for the problem constraints.
Each word is processed once, and each prefix lookup in the hash set is approximately O(1) average time.
The extra space comes from storing valid words in the hash set.
Test Cases
solution = Solution()
# Provided examples
assert solution.longestWord(
["k", "ki", "kir", "kira", "kiran"]
) == "kiran" # complete prefix chain
assert solution.longestWord(
["a", "banana", "app", "appl", "ap", "apply", "apple"]
) == "apple" # lexicographical tie-breaking
assert solution.longestWord(
["abc", "bc", "ab", "qwe"]
) == "" # no valid words
# Single word
assert solution.longestWord(
["a"]
) == "a" # single character word is valid
# Missing intermediate prefix
assert solution.longestWord(
["a", "abc"]
) == "a" # abc invalid because ab missing
# Multiple valid chains
assert solution.longestWord(
["w", "wo", "wor", "worl", "world"]
) == "world" # full valid chain
# Lexicographical comparison
assert solution.longestWord(
["a", "ap", "app", "appl", "apple", "apply"]
) == "apple" # same length, smaller lexicographically
# No single character words
assert solution.longestWord(
["ab", "abc", "abcd"]
) == "" # no starting prefix
# Multiple one-letter words
assert solution.longestWord(
["a", "b", "ba", "ban", "bana", "banan", "banana"]
) == "banana" # longest valid chain
# Duplicate-like structure
assert solution.longestWord(
["t", "to", "top", "tops", "toy"]
) == "tops" # tops longer than toy
Test Case Summary
| Test | Why |
|---|---|
["k","ki","kir","kira","kiran"] |
Verifies standard valid prefix chain |
["a","banana","app","appl","ap","apply","apple"] |
Verifies lexicographical tie-breaking |
["abc","bc","ab","qwe"] |
Verifies empty result case |
["a"] |
Smallest valid input |
["a","abc"] |
Missing intermediate prefix |
["w","wo","wor","worl","world"] |
Complete growing chain |
["a","ap","app","appl","apple","apply"] |
Equal-length comparison |
["ab","abc","abcd"] |
No valid starting prefix |
["a","b","ba","ban","bana","banan","banana"] |
Long valid construction |
["t","to","top","tops","toy"] |
Competing valid chains |
Edge Cases
No Valid Words Exist
An important edge case occurs when no word has all required prefixes. For example:
["abc", "abcd", "xyz"]
None of these words have their shorter prefixes present. A buggy implementation might incorrectly return the longest word anyway. This solution avoids that problem because a word only enters valid_words if its immediate prefix already exists there.
Multiple Longest Words
The problem requires lexicographical tie-breaking. Consider:
["a", "ap", "app", "appl", "apple", "apply"]
Both "apple" and "apply" are valid and have the same length. A careless implementation might simply return whichever appears first in the input. This solution explicitly compares lexicographical order when lengths match.
Missing Intermediate Prefix
A word may have some prefixes present but not all of them. For example:
["a", "app"]
The word "app" is invalid because "ap" is missing. The algorithm handles this correctly because "app" requires "ap" to already exist in valid_words, which it does not.
Single Character Words
Any one-letter word is automatically valid because there are no smaller non-empty prefixes to check. The implementation explicitly handles this with:
if len(word) == 1
Without this condition, no chain could ever begin.