LeetCode 1805 - Number of Different Integers in a String

The problem requires us to process a string word that contains both lowercase English letters and digits. We are asked to extract all the sequences of digits as integers, ignoring any non-digit characters.

LeetCode Problem 1805

Difficulty: 🟢 Easy
Topics: Hash Table, String

Solution

Problem Understanding

The problem requires us to process a string word that contains both lowercase English letters and digits. We are asked to extract all the sequences of digits as integers, ignoring any non-digit characters. After extracting these sequences, we need to count how many unique integers exist, considering that integers with leading zeros are equivalent to their non-zero counterparts. For example, "001" and "1" are considered the same integer.

The input word has a length between 1 and 1000 characters, so we do not need to optimize for extremely large strings. The key challenge is correctly handling sequences of digits separated by letters, converting them to their integer representation, and eliminating duplicates efficiently.

Important edge cases include strings with multiple leading zeros, strings that contain only letters (no integers), and strings with consecutive digits separated by letters. The problem guarantees that there is at least one character in the input, so we do not need to handle empty input strings.

Approaches

The brute-force approach would be to iterate through the string character by character, building substrings for each digit sequence. Once we find a non-digit character or reach the end of the string, we convert the substring into an integer (removing leading zeros) and store it in a list. After processing the entire string, we could use a loop to count the unique integers. This approach works correctly but is inefficient because we may repeatedly search through the list to check for duplicates.

The optimal approach leverages a hash set to automatically handle uniqueness. We iterate through the string, accumulate characters for digit sequences, and convert each complete sequence to an integer string without leading zeros. We then insert this cleaned integer string into the set. Using a set ensures that each integer is counted only once, and we avoid repeated searches for duplicates.

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(n) Builds a list of all integers and searches for duplicates
Optimal O(n) O(n) Uses a hash set to store unique integer strings efficiently

Algorithm Walkthrough

  1. Initialize an empty string current_number to accumulate digits and an empty set unique_numbers to store unique integers.
  2. Iterate through each character ch in the string word.
  3. If ch is a digit, append it to current_number.
  4. If ch is a letter and current_number is not empty, strip leading zeros from current_number (ensure at least one zero remains if the number is "0"), and add it to unique_numbers. Then reset current_number to an empty string.
  5. After finishing the loop, check if current_number contains a number. If so, strip leading zeros and add it to unique_numbers.
  6. Return the size of unique_numbers, which represents the number of different integers in the string.

Why it works: Each number sequence is processed exactly once, leading zeros are normalized, and duplicates are automatically removed by the hash set. This guarantees correctness.

Python Solution

class Solution:
    def numDifferentIntegers(self, word: str) -> int:
        unique_numbers = set()
        current_number = ""
        
        for ch in word:
            if ch.isdigit():
                current_number += ch
            else:
                if current_number:
                    # Remove leading zeros and ensure "0" remains if empty
                    cleaned_number = current_number.lstrip('0') or '0'
                    unique_numbers.add(cleaned_number)
                    current_number = ""
        
        if current_number:
            cleaned_number = current_number.lstrip('0') or '0'
            unique_numbers.add(cleaned_number)
        
        return len(unique_numbers)

The implementation iterates through each character and constructs digit sequences in current_number. When a non-digit is encountered, the accumulated number is cleaned of leading zeros and added to a set, ensuring uniqueness. The final set size is returned as the result.

Go Solution

import (
    "strings"
)

func numDifferentIntegers(word string) int {
    uniqueNumbers := make(map[string]struct{})
    currentNumber := ""
    
    for _, ch := range word {
        if ch >= '0' && ch <= '9' {
            currentNumber += string(ch)
        } else {
            if currentNumber != "" {
                cleanedNumber := strings.TrimLeft(currentNumber, "0")
                if cleanedNumber == "" {
                    cleanedNumber = "0"
                }
                uniqueNumbers[cleanedNumber] = struct{}{}
                currentNumber = ""
            }
        }
    }
    
    if currentNumber != "" {
        cleanedNumber := strings.TrimLeft(currentNumber, "0")
        if cleanedNumber == "" {
            cleanedNumber = "0"
        }
        uniqueNumbers[cleanedNumber] = struct{}{}
    }
    
    return len(uniqueNumbers)
}

The Go solution is almost identical in logic to the Python version. We use a map as a set, converting digit sequences into cleaned strings. Go requires explicit handling of empty strings after trimming leading zeros.

Worked Examples

Example "a123bc34d8ef34":

Step Character Current Number Set State
1 'a' "" {}
2 '1' "1" {}
3 '2' "12" {}
4 '3' "123" {}
5 'b' "" {"123"}
6 'c' "" {"123"}
7 '3' "3" {"123"}
8 '4' "34" {"123"}
9 'd' "" {"123","34"}
10 '8' "8" {"123","34"}
11 'e' "" {"123","34","8"}
12 'f' "" {"123","34","8"}
13 '3' "3" {"123","34","8"}
14 '4' "34" {"123","34","8"}

Final set size = 3.

Complexity Analysis

Measure Complexity Explanation
Time O(n) We scan the string once and perform O(1) operations for each character
Space O(n) In the worst case, all characters are digits and unique, stored in the set

The linear time complexity is efficient for n <= 1000, and the space complexity is dominated by the storage of unique numbers.

Test Cases

# Provided examples
assert Solution().numDifferentIntegers("a123bc34d8ef34") == 3  # three unique numbers
assert Solution().numDifferentIntegers("leet1234code234") == 2  # two unique numbers
assert Solution().numDifferentIntegers("a1b01c001") == 1  # leading zeros treated as same

# Edge and additional cases
assert Solution().numDifferentIntegers("abc") == 0  # no numbers
assert Solution().numDifferentIntegers("0000") == 1  # single zero
assert Solution().numDifferentIntegers("0a0b00") == 1  # multiple zeros
assert Solution().numDifferentIntegers("123abc0456def123") == 2  # repeated numbers
assert Solution().numDifferentIntegers("9876543210") == 1  # single large number
Test Why
"a123bc34d8ef34" Standard mixed letters and digits
"leet1234code234" No repeated digits
"a1b01c001" Leading zeros normalization
"abc" No digits present
"0000" Only zeros
"0a0b00" Multiple zeros separated by letters
"123abc0456def123" Duplicates with leading zeros
"9876543210" Large number handling

Edge Cases

  1. Leading Zeros: Numbers like "01", "001" and "1" should be treated as the same integer. This can trip naive implementations if they compare strings directly. The solution handles this by stripping leading zeros.
  2. Only Letters: If the input contains no digits, the result should be 0. The solution correctly returns the length of an empty set.
  3. Consecutive Zeros: Inputs like "0000" should count as a single unique integer "0". The solution ensures that if all digits are zeros, the normalized number is "0" and added once to the set.
  4. Numbers at the End: If the last characters of the string form a number, it must be processed after the loop ends. Both implementations handle this final check outside the main loop.
  5. Large Numbers: Even if a number is very large (e.g., 10 digits), Python handles it naturally, and in Go we store it as a string to avoid integer overflow.

This solution robustly handles all the edge cases while maintaining optimal efficiency.