LeetCode 269 - Alien Dictionary

This problem asks us to reconstruct the ordering of characters in an unknown alien language. We are given a list of words that are already sorted according to the alien language's lexicographical order.

LeetCode Problem 269

Difficulty: 🔴 Hard
Topics: Array, String, Depth-First Search, Breadth-First Search, Graph Theory, Topological Sort

Solution

LeetCode 269, Alien Dictionary

Problem Understanding

This problem asks us to reconstruct the ordering of characters in an unknown alien language.

We are given a list of words that are already sorted according to the alien language's lexicographical order. The alphabet itself still uses lowercase English letters, but the order between letters is unknown. Our task is to infer a valid ordering of those letters.

The important observation is that lexicographical ordering works similarly to dictionary ordering in English. When comparing two words, the first position where the characters differ determines the ordering between those two characters.

For example:

"wrt"
"wrf"

The first differing characters are t and f, therefore we know:

t comes before f

These relationships form a directed graph:

t -> f

The final answer is a topological ordering of this graph.

The input is an array of strings:

words[i]

Each word contains lowercase English letters. The words are guaranteed to contain only valid characters, but the ordering itself may be invalid.

The output should be:

  • A string representing one valid character ordering, if possible
  • An empty string "" if the dictionary ordering is contradictory

The constraints are relatively small:

  • At most 100 words
  • Each word length at most 100
  • Only lowercase English letters

This means we do not need extremely advanced optimizations, but we do need a correct graph-based solution.

Several edge cases are especially important:

  1. Cycles in the ordering graph

If we derive relationships like:

a -> b
b -> c
c -> a

then no valid ordering exists.

  1. Prefix invalidation

This is one of the trickiest cases.

Example:

["abc", "ab"]

This is invalid because a longer word appears before its own prefix. In any lexicographical ordering, "ab" must come before "abc".

  1. Characters with no edges

Some letters may appear in words but never participate in ordering constraints. They still must appear in the final answer.

  1. Multiple valid orderings

The problem allows returning any valid topological ordering.

Approaches

Brute Force Approach

A brute-force approach would attempt to generate all possible permutations of the unique characters and test whether each permutation satisfies the dictionary ordering.

Suppose there are k unique characters. We could:

  1. Generate every permutation of those characters
  2. Build a ranking map for the permutation
  3. Verify whether all adjacent words are correctly ordered

This approach is theoretically correct because it eventually checks every possible alphabet ordering.

However, it is far too slow.

The number of permutations is:

k!

Even with only 10 unique characters:

10! = 3,628,800

The search space becomes enormous very quickly.

Optimal Approach, Graph + Topological Sort

The key insight is that each adjacent pair of words gives us direct ordering information.

When two adjacent words differ at some position:

word1[i] != word2[i]

then we immediately know:

word1[i] comes before word2[i]

These relationships naturally form a directed graph.

For example:

w -> e
e -> r
r -> t
t -> f

Finding a valid alphabet ordering becomes equivalent to finding a valid topological ordering of the graph.

A topological sort works because:

  • Directed edges represent ordering constraints
  • A valid topological order respects all constraints
  • Cycles indicate contradictory constraints

We can use Kahn's Algorithm, BFS-based topological sorting, to efficiently compute the ordering.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(k! * n * m) O(k) Generates every possible character ordering
Optimal O(C + E) O(C + E) Uses graph construction and topological sort

Where:

  • C = total number of unique characters
  • E = number of ordering relationships

Algorithm Walkthrough

  1. Initialize the graph and indegree map

We first collect every unique character from all words.

Even if a character has no incoming or outgoing edges, it still belongs in the final ordering, so every character must exist in the graph structures.

We maintain:

  • graph[char], adjacency list
  • indegree[char], number of incoming edges
  1. Compare adjacent words

We compare:

words[i]
words[i + 1]

because only adjacent words provide direct lexicographical constraints.

  1. Detect invalid prefix cases

Before comparing characters, we check:

if len(word1) > len(word2) and word1.startswith(word2)

Example:

["abc", "ab"]

This is impossible in lexicographical ordering, so we immediately return "".

  1. Find the first differing character

We compare characters position by position.

The first mismatch determines the ordering relationship.

Example:

"wrt"
"wrf"

The first difference is:

t != f

Therefore:

t -> f

We add a directed edge.

We only use the first differing character because later characters provide no information once lexicographical order is already determined.

  1. Build indegrees carefully

When adding an edge:

u -> v

we increment:

indegree[v]

We also avoid duplicate edges using a set check.

  1. Initialize BFS queue

All characters with indegree 0 have no prerequisites.

These can appear first in the ordering.

We place them into a queue.

  1. Perform topological sort

We repeatedly:

  • Remove a node from the queue
  • Add it to the answer
  • Reduce indegree of neighbors
  • Push neighbors whose indegree becomes 0

This ensures characters only appear after all prerequisites are satisfied.

  1. Detect cycles

If the resulting ordering length is smaller than the number of unique characters, then some nodes were never processed.

That means a cycle exists.

In that case we return "".

Why it works

The graph precisely represents all ordering constraints derived from the dictionary.

A topological ordering guarantees that every directed edge:

u -> v

appears with u before v.

If a cycle exists, no such ordering is possible because some characters would depend on themselves indirectly.

Therefore:

  • A successful topological sort gives a valid alien alphabet
  • Failure to process all nodes correctly identifies contradictions

Python Solution

from typing import List
from collections import defaultdict, deque

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        # Initialize graph and indegree for all unique characters
        graph = defaultdict(set)
        indegree = {}

        for word in words:
            for char in word:
                indegree[char] = 0

        # Build graph
        for i in range(len(words) - 1):
            word1 = words[i]
            word2 = words[i + 1]

            # Invalid prefix case
            if len(word1) > len(word2) and word1.startswith(word2):
                return ""

            min_length = min(len(word1), len(word2))

            for j in range(min_length):
                if word1[j] != word2[j]:
                    parent = word1[j]
                    child = word2[j]

                    # Avoid duplicate edges
                    if child not in graph[parent]:
                        graph[parent].add(child)
                        indegree[child] += 1

                    break

        # Start with all nodes having indegree 0
        queue = deque()

        for char in indegree:
            if indegree[char] == 0:
                queue.append(char)

        result = []

        # Kahn's topological sort
        while queue:
            current = queue.popleft()
            result.append(current)

            for neighbor in graph[current]:
                indegree[neighbor] -= 1

                if indegree[neighbor] == 0:
                    queue.append(neighbor)

        # Cycle detection
        if len(result) != len(indegree):
            return ""

        return "".join(result)

The implementation begins by initializing all unique characters in the indegree map. This guarantees isolated characters are preserved in the final answer.

The graph uses a defaultdict(set) so duplicate edges are avoided naturally. Duplicate edges are important to prevent because multiple identical constraints should not artificially increase indegree counts.

The adjacent word comparison phase extracts ordering relationships. Only the first differing character matters because lexicographical ordering is decided immediately at the first mismatch.

The prefix validation is handled before character comparison. This catches invalid inputs like:

["abc", "ab"]

The BFS phase implements Kahn's Algorithm. Characters with indegree 0 are processed first because they have no prerequisites.

Finally, cycle detection compares the number of processed nodes with the number of unique characters. If they differ, a cycle prevented completion of the topological ordering.

Go Solution

package main

import (
	"strings"
)

func alienOrder(words []string) string {
	graph := make(map[byte]map[byte]bool)
	indegree := make(map[byte]int)

	// Initialize all characters
	for _, word := range words {
		for i := 0; i < len(word); i++ {
			indegree[word[i]] = 0
		}
	}

	// Build graph
	for i := 0; i < len(words)-1; i++ {
		word1 := words[i]
		word2 := words[i+1]

		// Invalid prefix case
		if len(word1) > len(word2) &&
			strings.HasPrefix(word1, word2) {
			return ""
		}

		minLen := len(word1)
		if len(word2) < minLen {
			minLen = len(word2)
		}

		for j := 0; j < minLen; j++ {
			if word1[j] != word2[j] {
				parent := word1[j]
				child := word2[j]

				if graph[parent] == nil {
					graph[parent] = make(map[byte]bool)
				}

				// Avoid duplicate edges
				if !graph[parent][child] {
					graph[parent][child] = true
					indegree[child]++
				}

				break
			}
		}
	}

	// Queue for BFS
	queue := []byte{}

	for char, degree := range indegree {
		if degree == 0 {
			queue = append(queue, char)
		}
	}

	result := []byte{}

	// Kahn's algorithm
	for len(queue) > 0 {
		current := queue[0]
		queue = queue[1:]

		result = append(result, current)

		for neighbor := range graph[current] {
			indegree[neighbor]--

			if indegree[neighbor] == 0 {
				queue = append(queue, neighbor)
			}
		}
	}

	// Cycle detection
	if len(result) != len(indegree) {
		return ""
	}

	return string(result)
}

The Go implementation follows the same algorithmic structure as the Python solution.

One difference is the graph representation. Go does not have a built-in set type, so the adjacency list uses:

map[byte]bool

to simulate a set.

The BFS queue is implemented using slices. Removing the first element:

queue = queue[1:]

works efficiently enough for the given constraints.

Go also requires explicit handling of missing maps, so adjacency maps are initialized lazily before inserting edges.

Worked Examples

Example 1

words = ["wrt","wrf","er","ett","rftt"]

Step 1, Initialize Characters

Character Indegree
w 0
r 0
t 0
f 0
e 0

Step 2, Build Graph

Compare "wrt" and "wrf"

First difference:

t != f

Add:

t -> f
Edge Result
t -> f indegree[f] = 1

Compare "wrf" and "er"

First difference:

w != e

Add:

w -> e
Edge Result
w -> e indegree[e] = 1

Compare "er" and "ett"

First difference:

r != t

Add:

r -> t
Edge Result
r -> t indegree[t] = 1

Compare "ett" and "rftt"

First difference:

e != r

Add:

e -> r
Edge Result
e -> r indegree[r] = 1

Final Graph

Node Neighbors
w e
e r
r t
t f

Indegrees

Character Indegree
w 0
e 1
r 1
t 1
f 1

BFS Process

Queue Output
[w] ""
[e] "w"
[r] "we"
[t] "wer"
[f] "wert"
[] "wertf"

Final answer:

"wertf"

Example 2

words = ["z","x"]

Graph Construction

Compare:

"z"
"x"

First difference:

z != x

Add:

z -> x

Indegrees

Character Indegree
z 0
x 1

BFS

Queue Output
[z] ""
[x] "z"
[] "zx"

Final answer:

"zx"

Example 3

words = ["z","x","z"]

Graph Construction

Compare "z" and "x":

z -> x

Compare "x" and "z":

x -> z

Final Graph

z -> x
x -> z

This creates a cycle.

BFS Initialization

Every node has indegree 1.

Queue starts empty.

Result

No nodes can be processed.

Since:

processed nodes != total nodes

we detect a cycle and return:

""

Complexity Analysis

Measure Complexity Explanation
Time O(C + E) Each character and edge is processed a constant number of times
Space O(C + E) Graph, indegree map, queue, and result storage

Where:

  • C is the number of unique characters
  • E is the number of ordering constraints

The graph construction phase examines adjacent word pairs and processes each character comparison once. The BFS topological sort processes every node and edge exactly once.

Because the alphabet is limited to lowercase English letters, the practical runtime is very small, but the asymptotic analysis still follows standard graph complexity.

Test Cases

from typing import List

class Solution:
    def alienOrder(self, words: List[str]) -> str:
        from collections import defaultdict, deque

        graph = defaultdict(set)
        indegree = {}

        for word in words:
            for char in word:
                indegree[char] = 0

        for i in range(len(words) - 1):
            word1 = words[i]
            word2 = words[i + 1]

            if len(word1) > len(word2) and word1.startswith(word2):
                return ""

            for j in range(min(len(word1), len(word2))):
                if word1[j] != word2[j]:
                    if word2[j] not in graph[word1[j]]:
                        graph[word1[j]].add(word2[j])
                        indegree[word2[j]] += 1
                    break

        queue = deque(
            [c for c in indegree if indegree[c] == 0]
        )

        result = []

        while queue:
            current = queue.popleft()
            result.append(current)

            for neighbor in graph[current]:
                indegree[neighbor] -= 1

                if indegree[neighbor] == 0:
                    queue.append(neighbor)

        if len(result) != len(indegree):
            return ""

        return "".join(result)

solution = Solution()

# Basic valid ordering
assert solution.alienOrder(
    ["wrt","wrf","er","ett","rftt"]
) == "wertf"

# Simple two-character ordering
assert solution.alienOrder(["z","x"]) == "zx"

# Cycle detection
assert solution.alienOrder(["z","x","z"]) == ""

# Invalid prefix case
assert solution.alienOrder(["abc","ab"]) == ""

# Single word
assert set(solution.alienOrder(["abc"])) == {"a", "b", "c"}

# Multiple disconnected characters
result = solution.alienOrder(["ab","ac"])
assert result.index("b") < result.index("c")

# Duplicate constraints should not overcount indegree
result = solution.alienOrder(["za","zb","ca","cb"])
assert result != ""

# All same character
assert solution.alienOrder(["a","a"]) == "a"

# Independent letters
result = solution.alienOrder(["a","b","c"])
assert result == "abc"

# Larger cycle
assert solution.alienOrder(["abc","bca","cab","aac"]) == ""

Test Summary

Test Why
["wrt","wrf","er","ett","rftt"] Standard topological ordering
["z","x"] Simple direct dependency
["z","x","z"] Detects graph cycle
["abc","ab"] Validates prefix invalidation
["abc"] Handles single-word input
["ab","ac"] Tests partial ordering constraints
["za","zb","ca","cb"] Ensures duplicate edges are ignored
["a","a"] Handles repeated identical words
["a","b","c"] Simple linear ordering
["abc","bca","cab","aac"] Detects more complex cycles

Edge Cases

Invalid Prefix Relationships

One of the most common bugs in this problem comes from failing to handle prefix violations.

Example:

["abc", "ab"]

A naive implementation might compare characters, find no mismatch, and incorrectly conclude the ordering is valid.

However, lexicographically, a shorter prefix must always come before the longer word. Since "abc" appears before "ab", the input is invalid.

The implementation explicitly checks:

if len(word1) > len(word2) and word1.startswith(word2):

and immediately returns "".

Cycles in the Dependency Graph

Another critical edge case is cyclic dependencies.

Example:

["z","x","z"]

This creates:

z -> x
x -> z

No valid alphabet can satisfy both constraints simultaneously.

The implementation detects this naturally through topological sorting. If some nodes never reach indegree 0, they cannot be processed, meaning a cycle exists.

Characters Without Constraints

Some characters may never participate in any ordering relationship.

Example:

["abc"]

There are no edges because there is only one word.

A buggy implementation might omit such characters entirely.

This solution avoids that issue by initializing every unique character before graph construction:

for word in words:
    for char in word:
        indegree[char] = 0

As a result, isolated characters still appear in the final ordering.