LeetCode 3758 - Convert Number Words to Digits

The problem asks us to parse a string s that contains lowercase English letters and may include concatenated English words representing digits from 0 to 9.

LeetCode Problem 3758

Difficulty: 🟡 Medium
Topics: String, Trie

Solution

Problem Understanding

The problem asks us to parse a string s that contains lowercase English letters and may include concatenated English words representing digits from 0 to 9. Our goal is to extract all valid number words in order and convert them to their corresponding digit, producing a string of digits.

The string must be processed from left to right, and at each position we have two possibilities: if a valid number word starts at the current index, we append its corresponding digit to the result and advance the cursor by the length of the word; otherwise, we skip exactly one character and continue parsing.

The input constraints are significant: the string length can be up to 10^5, which rules out any naive solution that might repeatedly scan substrings without indexing efficiently. Only lowercase letters are present, so we do not need to worry about capitalization, punctuation, or whitespace.

Important edge cases include incomplete fragments (like "tw" for "two") that do not form valid number words, and interspersed invalid letters (like "x" in "ninexsix"), which must be skipped individually without halting the parsing.

Approaches

A brute-force approach would iterate over every position in the string and, for each position, check every possible number word to see if it matches the substring starting there. This guarantees correctness but is inefficient because for every character, we may check up to 10 number words of varying lengths, resulting in worst-case time complexity of O(n * m) where m is the sum of lengths of all number words. For n = 10^5, this can be slow.

The optimal approach leverages a Trie (prefix tree) to efficiently check for number word matches starting at each character. A Trie allows us to traverse character by character and immediately stop if no number word can continue. This reduces redundant string comparisons and ensures we only examine valid paths.

Approach Time Complexity Space Complexity Notes
Brute Force O(n * m) O(1) Check all number words at each position
Trie-based Optimal O(n * k) O(total chars in number words) Use Trie to efficiently match prefixes, k <= max word length (4)

Algorithm Walkthrough

  1. Construct a mapping of number words to digits: "zero": "0", "one": "1", …, "nine": "9".
  2. Build a Trie where each node represents a character, and terminal nodes store the corresponding digit. This allows prefix matching efficiently.
  3. Initialize an empty result string and a pointer at the start of the input string.
  4. At each position, attempt to match a number word using the Trie.
  • Traverse the Trie character by character following the input string.
  • If a terminal node is reached, append the corresponding digit to the result, and advance the pointer by the length of the matched word.
  • If no match is found, move the pointer forward by 1.
  1. Repeat step 4 until the end of the string is reached.
  2. Return the accumulated result string.

Why it works: The algorithm ensures that for every character, we attempt the longest valid prefix match possible starting at that index. Skipping a character only occurs when no valid number word can start at that position. The Trie guarantees that we only follow valid paths for number words, maintaining correctness and efficiency.

Python Solution

class TrieNode:
    def __init__(self):
        self.children = {}
        self.digit = None

class Solution:
    def convertNumber(self, s: str) -> str:
        number_words = {
            "zero": "0", "one": "1", "two": "2", "three": "3",
            "four": "4", "five": "5", "six": "6", "seven": "7",
            "eight": "8", "nine": "9"
        }
        
        # Build Trie
        root = TrieNode()
        for word, digit in number_words.items():
            node = root
            for char in word:
                if char not in node.children:
                    node.children[char] = TrieNode()
                node = node.children[char]
            node.digit = digit
        
        res = []
        i = 0
        while i < len(s):
            node = root
            j = i
            matched = False
            while j < len(s) and s[j] in node.children:
                node = node.children[s[j]]
                if node.digit is not None:
                    res.append(node.digit)
                    i = j + 1
                    matched = True
                    break
                j += 1
            if not matched:
                i += 1
        return "".join(res)

The Python code builds a Trie for all number words and iterates through the input string. At each position, it tries to match a number word by traversing the Trie. When a word is found, it appends the digit and skips the length of the word. Otherwise, it moves forward by one character.

Go Solution

type TrieNode struct {
	children map[rune]*TrieNode
	digit    string
}

func convertNumber(s string) string {
	numberWords := map[string]string{
		"zero": "0", "one": "1", "two": "2", "three": "3",
		"four": "4", "five": "5", "six": "6", "seven": "7",
		"eight": "8", "nine": "9",
	}

	root := &TrieNode{children: make(map[rune]*TrieNode)}
	for word, digit := range numberWords {
		node := root
		for _, ch := range word {
			if _, ok := node.children[ch]; !ok {
				node.children[ch] = &TrieNode{children: make(map[rune]*TrieNode)}
			}
			node = node.children[ch]
		}
		node.digit = digit
	}

	var res []byte
	i := 0
	runes := []rune(s)
	for i < len(runes) {
		node := root
		j := i
		matched := false
		for j < len(runes) {
			ch := runes[j]
			next, ok := node.children[ch]
			if !ok {
				break
			}
			node = next
			if node.digit != "" {
				res = append(res, node.digit[0])
				i = j + 1
				matched = true
				break
			}
			j++
		}
		if !matched {
			i++
		}
	}
	return string(res)
}

The Go implementation mirrors the Python solution. Rune slices are used to handle characters efficiently, and the Trie structure is represented using maps. Digits are appended as bytes to a result slice, and the final string is returned.

Worked Examples

Example 1: s = "onefourthree"

i Current char Trie Match Result Next i
0 'o' "one" → 1 "1" 3
3 'f' "four" → 4 "14" 7
7 't' "three" → 3 "143" 12

Output: "143"

Example 2: s = "ninexsix"

i Current char Trie Match Result Next i
0 'n' "nine" → 9 "9" 4
4 'x' none "9" 5
5 's' "six" → 6 "96" 8

Output: "96"

Example 3: s = "zeero"

No valid matches, all characters skipped.

Output: ""

Complexity Analysis

Measure Complexity Explanation
Time O(n * k) n is string length, k is max number word length (≤5), each character is visited once, Trie traversal takes ≤k steps
Space O(m) m is total characters in all number words (Trie nodes), negligible compared to input

The time complexity is effectively linear relative to the input size since k is small (maximum word length is 5). Space is proportional to the Trie, which is small and constant in practice.

Test Cases

sol = Solution()

# Provided examples
assert sol.convertNumber("onefourthree") == "143"  # normal case with multiple valid words
assert sol.convertNumber("ninexsix") == "96"      # interspersed invalid character
assert sol.convertNumber("zeero") == ""          # all invalid
assert sol.convertNumber("tw") == ""             # incomplete fragment

# Edge cases
assert sol.convertNumber("zero") == "0"          # single word
assert sol.convertNumber("onetwothreefourfive") == "12345"  # consecutive words
assert sol.convertNumber("xoney") == "1"         # single valid word surrounded by invalid chars
assert sol.convertNumber("sixeightseven") == "687" # multiple consecutive words
assert sol.convertNumber("") == ""               # empty string
Test Why