LeetCode 1858 - Longest Word With All Prefixes

The problem gives us an array of lowercase strings called words. We need to find the longest word such that every prefix of that word also exists in the array. A prefix means the string formed by taking characters from the beginning of the word.

LeetCode Problem 1858

Difficulty: 🟡 Medium
Topics: Array, String, Depth-First Search, Trie

Solution

Problem Understanding

The problem gives us an array of lowercase strings called words. We need to find the longest word such that every prefix of that word also exists in the array.

A prefix means the string formed by taking characters from the beginning of the word. For example, for "apple":

  • Length 1 prefix: "a"
  • Length 2 prefix: "ap"
  • Length 3 prefix: "app"
  • Length 4 prefix: "appl"
  • Length 5 prefix: "apple"

For "apple" to qualify, all shorter prefixes must also appear in the input array. If even one prefix is missing, the word is invalid.

The output should be:

  • The longest valid word
  • If multiple valid words have the same maximum length, choose the lexicographically smallest one
  • If no valid word exists, return an empty string

The constraints are important:

  • Up to 10^5 words
  • Total length of all words is at most 10^5

This total length constraint tells us that solutions proportional to the total number of characters are acceptable. However, quadratic approaches over all strings or repeated expensive substring searches could become too slow.

There are several important edge cases:

  • A word may exist without its smaller prefixes. For example, "banana" alone is invalid because "b", "ba", and so on are missing.
  • Multiple words can have the same maximum length, requiring lexicographical comparison.
  • A single character word is automatically valid because it has no shorter non-empty prefixes.
  • There may be no valid chain at all, in which case we return "".

Approaches

Brute Force Approach

A straightforward solution is to examine every word independently.

For each word:

  1. Generate all prefixes of lengths 1 through len(word) - 1
  2. Check whether each prefix exists in the input array
  3. If all prefixes exist, the word is valid
  4. Track the best answer according to:
  • Longer length first
  • Lexicographically smaller if lengths tie

To make prefix lookup efficient, we store all words in a hash set.

This approach is correct because it explicitly verifies the required property for every candidate word. However, it repeatedly constructs prefixes for every word, which can create unnecessary overhead.

Even though the total character count is bounded by 10^5, we can still improve the solution structure and make the logic cleaner.

Key Insight

A word is valid if and only if its immediate shorter prefix is valid.

For example:

  • "appl" is valid only if "app" is valid
  • "app" is valid only if "ap" is valid
  • "ap" is valid only if "a" is valid

This creates a natural incremental construction process.

If we process words in sorted order by length, we can build a set of already validated words. Then:

  • A one-character word is automatically valid
  • A longer word is valid if its prefix word[:-1] is already valid

This avoids repeatedly checking every smaller prefix from scratch.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(total_chars) O(total_chars) Checks every prefix for every word individually
Optimal O(total_chars log n) O(total_chars) Builds valid words incrementally using a hash set

Algorithm Walkthrough

Optimal Algorithm

  1. Store all words in a sortable list and sort them lexicographically.

Lexicographical sorting is important because when two valid words have the same length, we want the smaller one. Sorting first ensures that earlier words naturally satisfy the tie-breaking rule. 2. Sort words by length ascending, then lexicographically.

This guarantees that before processing a word, all shorter candidate prefixes have already been processed. 3. Create a hash set called valid_words.

This set will contain only words whose prefixes are all valid. 4. Initialize an empty string best_answer.

This variable tracks the longest valid word found so far. 5. Iterate through every word.

For each word:

  • If its length is 1, it is automatically valid
  • Otherwise, check whether word[:-1] exists in valid_words
  1. If the word is valid:
  • Add it to valid_words

  • Compare it against best_answer

  • Update the answer if:

  • The word is longer, or

  • The lengths are equal and the word is lexicographically smaller

  1. Return best_answer after processing all words.

Why it works

The algorithm maintains the invariant that valid_words contains exactly the words whose prefixes all exist in the array.

When processing a word, we only need to check its immediate shorter prefix because that prefix itself could only have entered valid_words if all of its prefixes were valid. This creates an inductive chain of correctness.

Since words are processed from shorter to longer lengths, every necessary prefix has already been evaluated before the current word.

Python Solution

from typing import List

class Solution:
    def longestWord(self, words: List[str]) -> str:
        words.sort(key=lambda word: (len(word), word))

        valid_words = set()
        best_answer = ""

        for word in words:
            if len(word) == 1 or word[:-1] in valid_words:
                valid_words.add(word)

                if (
                    len(word) > len(best_answer)
                    or (
                        len(word) == len(best_answer)
                        and word < best_answer
                    )
                ):
                    best_answer = word

        return best_answer

The implementation begins by sorting words according to two criteria:

  • Shorter words first
  • Lexicographically smaller words first among equal lengths

This ordering guarantees that all possible prefixes are processed before their longer extensions.

The valid_words set stores only words that satisfy the prefix condition. A word becomes valid in two situations:

  • It has length 1
  • Its immediate prefix already belongs to valid_words

The expression word[:-1] removes the last character and gives the required shorter prefix.

Whenever a valid word is found, the algorithm compares it against the current best answer. The comparison logic follows the problem rules exactly:

  • Prefer longer words
  • If lengths tie, prefer lexicographically smaller words

Finally, the method returns the best valid word discovered during traversal.

Go Solution

package main

import (
	"sort"
)

func longestWord(words []string) string {
	sort.Slice(words, func(i, j int) bool {
		if len(words[i]) == len(words[j]) {
			return words[i] < words[j]
		}
		return len(words[i]) < len(words[j])
	})

	validWords := make(map[string]bool)
	bestAnswer := ""

	for _, word := range words {
		if len(word) == 1 || validWords[word[:len(word)-1]] {
			validWords[word] = true

			if len(word) > len(bestAnswer) ||
				(len(word) == len(bestAnswer) && word < bestAnswer) {
				bestAnswer = word
			}
		}
	}

	return bestAnswer
}

The Go solution follows the same logic as the Python implementation.

A custom comparator is used with sort.Slice to sort by length first and lexicographical order second.

Go does not have a built-in set type, so a map[string]bool is used instead. Presence in the map indicates that a word is valid.

Substring extraction uses slicing syntax:

word[:len(word)-1]

This produces the immediate prefix needed for validation.

Unlike Python, Go distinguishes between nil maps and initialized maps, so make(map[string]bool) is required before insertion.

Worked Examples

Example 1

Input: ["k","ki","kir","kira","kiran"]

After sorting:

Word
k
ki
kir
kira
kiran

Processing steps:

Current Word Required Prefix Prefix Exists? Valid Set After Step Best Answer
k none yes {k} k
ki k yes {k, ki} ki
kir ki yes {k, ki, kir} kir
kira kir yes {k, ki, kir, kira} kira
kiran kira yes {k, ki, kir, kira, kiran} kiran

Final answer:

"kiran"

Example 2

Input: ["a", "banana", "app", "appl", "ap", "apply", "apple"]

After sorting:

Word
a
ap
app
appl
apple
apply
banana

Processing:

Current Word Required Prefix Prefix Exists? Best Answer
a none yes a
ap a yes ap
app ap yes app
appl app yes appl
apple appl yes apple
apply appl yes apple
banana banan no apple

Both "apple" and "apply" are valid and length 5.

Since "apple" is lexicographically smaller:

"apple"

is returned.

Example 3

Input: ["abc", "bc", "ab", "qwe"]

After sorting:

Word
ab
bc
abc
qwe

Processing:

Current Word Required Prefix Prefix Exists? Valid?
ab a no no
bc b no no
abc ab no no
qwe qw no no

No valid words exist.

Final answer:

""

Complexity Analysis

Measure Complexity Explanation
Time O(n log n + total_chars) Sorting dominates, prefix checks are constant time
Space O(total_chars) Hash set stores valid words

The sorting step requires O(n log n) comparisons. Since the total character count is bounded by 10^5, this is efficient for the problem constraints.

Each word is processed once, and each prefix lookup in the hash set is approximately O(1) average time.

The extra space comes from storing valid words in the hash set.

Test Cases

solution = Solution()

# Provided examples
assert solution.longestWord(
    ["k", "ki", "kir", "kira", "kiran"]
) == "kiran"  # complete prefix chain

assert solution.longestWord(
    ["a", "banana", "app", "appl", "ap", "apply", "apple"]
) == "apple"  # lexicographical tie-breaking

assert solution.longestWord(
    ["abc", "bc", "ab", "qwe"]
) == ""  # no valid words

# Single word
assert solution.longestWord(
    ["a"]
) == "a"  # single character word is valid

# Missing intermediate prefix
assert solution.longestWord(
    ["a", "abc"]
) == "a"  # abc invalid because ab missing

# Multiple valid chains
assert solution.longestWord(
    ["w", "wo", "wor", "worl", "world"]
) == "world"  # full valid chain

# Lexicographical comparison
assert solution.longestWord(
    ["a", "ap", "app", "appl", "apple", "apply"]
) == "apple"  # same length, smaller lexicographically

# No single character words
assert solution.longestWord(
    ["ab", "abc", "abcd"]
) == ""  # no starting prefix

# Multiple one-letter words
assert solution.longestWord(
    ["a", "b", "ba", "ban", "bana", "banan", "banana"]
) == "banana"  # longest valid chain

# Duplicate-like structure
assert solution.longestWord(
    ["t", "to", "top", "tops", "toy"]
) == "tops"  # tops longer than toy

Test Case Summary

Test Why
["k","ki","kir","kira","kiran"] Verifies standard valid prefix chain
["a","banana","app","appl","ap","apply","apple"] Verifies lexicographical tie-breaking
["abc","bc","ab","qwe"] Verifies empty result case
["a"] Smallest valid input
["a","abc"] Missing intermediate prefix
["w","wo","wor","worl","world"] Complete growing chain
["a","ap","app","appl","apple","apply"] Equal-length comparison
["ab","abc","abcd"] No valid starting prefix
["a","b","ba","ban","bana","banan","banana"] Long valid construction
["t","to","top","tops","toy"] Competing valid chains

Edge Cases

No Valid Words Exist

An important edge case occurs when no word has all required prefixes. For example:

["abc", "abcd", "xyz"]

None of these words have their shorter prefixes present. A buggy implementation might incorrectly return the longest word anyway. This solution avoids that problem because a word only enters valid_words if its immediate prefix already exists there.

Multiple Longest Words

The problem requires lexicographical tie-breaking. Consider:

["a", "ap", "app", "appl", "apple", "apply"]

Both "apple" and "apply" are valid and have the same length. A careless implementation might simply return whichever appears first in the input. This solution explicitly compares lexicographical order when lengths match.

Missing Intermediate Prefix

A word may have some prefixes present but not all of them. For example:

["a", "app"]

The word "app" is invalid because "ap" is missing. The algorithm handles this correctly because "app" requires "ap" to already exist in valid_words, which it does not.

Single Character Words

Any one-letter word is automatically valid because there are no smaller non-empty prefixes to check. The implementation explicitly handles this with:

if len(word) == 1

Without this condition, no chain could ever begin.