LeetCode 8 - String to Integer (atoi)

The problem asks us to implement a simplified version of the C/C++ atoi function, which converts a string into a 32-bit signed integer. The conversion process is strict and follows several rules in a specific order.

LeetCode Problem 8

Difficulty: 🟡 Medium
Topics: String

Solution

Problem Understanding

The problem asks us to implement a simplified version of the C/C++ atoi function, which converts a string into a 32-bit signed integer. The conversion process is strict and follows several rules in a specific order. The function must parse the string carefully and stop as soon as the input becomes invalid for integer parsing.

The input is a string s that may contain spaces, signs, digits, letters, or punctuation. The goal is to extract a valid integer from the beginning of the string, following the parsing rules exactly.

The parsing process works like this:

  1. Ignore all leading spaces.
  2. Check whether the next character is '+' or '-' to determine the sign.
  3. Read consecutive digits and build the number.
  4. Stop reading as soon as a non-digit character appears.
  5. Clamp the result into the 32-bit signed integer range.

The valid integer range is:

  • Minimum: -2^31 = -2147483648
  • Maximum: 2^31 - 1 = 2147483647

If the parsed number exceeds these bounds, we must return the nearest boundary value instead.

The constraints are small, since the string length is at most 200 characters. This means performance is not a concern in practice, but the challenge is implementing the parsing logic correctly and handling all edge cases.

Several edge cases make this problem tricky:

  • Strings containing only spaces
  • Strings with a sign but no digits
  • Numbers with leading zeros
  • Characters appearing after digits
  • Invalid starting characters
  • Integer overflow
  • Multiple signs such as "--12" or "+-12"

The problem guarantees that the string contains only English letters, digits, spaces, '+', '-', and '.', so we do not need to handle Unicode or unusual symbols.

Approaches

Brute Force Approach

A brute force approach would try to manually extract every possible substring that could represent an integer, validate it, convert it, and then decide which one should be used according to the parsing rules.

For example, after trimming leading spaces, we could attempt to build candidate substrings character by character and repeatedly check whether they form valid integers. We might also store intermediate substrings and use built-in conversion functions with exception handling.

This works because eventually we can identify the longest valid numeric prefix and convert it into an integer. However, this approach introduces unnecessary work. It repeatedly creates substrings, performs multiple validations, and depends heavily on string manipulation.

Although the input size is small enough that this would still pass, the implementation becomes inefficient and harder to reason about.

Optimal Approach

The optimal solution processes the string exactly once using a pointer or index. Instead of generating substrings, we scan the input from left to right and maintain the current parsed number incrementally.

The key insight is that the parsing rules are sequential. We never need to revisit earlier characters. Once a non-digit character is encountered after parsing begins, the process immediately stops.

This naturally leads to a linear scan solution:

  • Skip spaces
  • Read optional sign
  • Read digits one at a time
  • Update the numeric result incrementally
  • Detect overflow before it happens

The most important optimization is overflow detection. Instead of building a huge integer and checking afterward, we check before adding the next digit whether the next multiplication and addition would exceed the allowed range.

This keeps the solution efficient, clean, and safe.

Approach Time Complexity Space Complexity Notes
Brute Force O(n²) O(n) Repeated substring creation and validation
Optimal O(n) O(1) Single pass parsing with incremental overflow handling

Algorithm Walkthrough

  1. Initialize an index pointer at the beginning of the string.

We use this pointer to scan the string from left to right exactly once. 2. Skip all leading whitespace characters.

According to the problem statement, spaces before the number should be ignored. We keep moving the pointer forward while the current character is a space. 3. Determine the sign of the number.

If the current character is '+', the number is positive.

If the current character is '-', the number is negative.

Otherwise, we assume the number is positive by default. 4. Start reading digits.

We continue scanning while the current character is a digit. Each digit updates the running result using:

$\text{result} = \text{result} \times 10 + \text{digit}$

This is the standard way to build a decimal number incrementally. 5. Detect overflow before updating the result.

Before multiplying by 10 and adding the next digit, we check whether the operation would exceed the 32-bit integer range.

For positive numbers:

$\text{result} > \frac{2147483647 - \text{digit}}{10}$

If this condition is true, we immediately return 2147483647.

For negative numbers, we similarly clamp to -2147483648. 6. Stop parsing at the first non-digit character.

The problem explicitly states that parsing ends immediately when a non-digit is encountered after digit parsing begins. 7. Return the final signed result.

Multiply the parsed magnitude by the sign and return the value.

Why it works

The algorithm maintains a simple invariant throughout parsing:

  • result always represents the integer formed by all valid digits processed so far.
  • index always points to the next unread character.
  • Overflow checks guarantee that result never leaves the valid 32-bit signed integer range.

Because the algorithm processes characters in the exact order specified by the problem rules, and because it stops immediately when parsing becomes invalid, the produced result always matches the required behavior.

Python Solution

class Solution:
    def myAtoi(self, s: str) -> int:
        INT_MAX = 2147483647
        INT_MIN = -2147483648

        index = 0
        n = len(s)

        # Skip leading whitespace
        while index < n and s[index] == ' ':
            index += 1

        # Handle empty string after spaces
        if index == n:
            return 0

        # Determine sign
        sign = 1

        if s[index] == '+':
            index += 1
        elif s[index] == '-':
            sign = -1
            index += 1

        result = 0

        # Read digits
        while index < n and s[index].isdigit():
            digit = ord(s[index]) - ord('0')

            # Overflow detection
            if result > (INT_MAX - digit) // 10:
                return INT_MAX if sign == 1 else INT_MIN

            result = result * 10 + digit
            index += 1

        return sign * result

The implementation closely follows the algorithm steps.

We first skip all leading spaces because they are irrelevant to the number parsing process. If the string contains only spaces, we immediately return 0.

Next, we inspect the current character to determine the sign. A missing sign means the number is positive by default.

The main parsing loop processes digits one at a time. Instead of building a substring and converting it later, we incrementally construct the integer value. This avoids unnecessary memory allocation and keeps the solution linear.

Overflow detection is performed before updating the result. This is critical because Python integers can grow arbitrarily large, but the problem specifically requires 32-bit signed integer behavior.

Finally, we apply the sign and return the result.

Go Solution

func myAtoi(s string) int {
	const INT_MAX = 2147483647
	const INT_MIN = -2147483648

	n := len(s)
	index := 0

	// Skip leading spaces
	for index < n && s[index] == ' ' {
		index++
	}

	// Empty string after spaces
	if index == n {
		return 0
	}

	// Determine sign
	sign := 1

	if s[index] == '+' {
		index++
	} else if s[index] == '-' {
		sign = -1
		index++
	}

	result := 0

	// Parse digits
	for index < n && s[index] >= '0' && s[index] <= '9' {
		digit := int(s[index] - '0')

		// Overflow detection
		if result > (INT_MAX-digit)/10 {
			if sign == 1 {
				return INT_MAX
			}
			return INT_MIN
		}

		result = result*10 + digit
		index++
	}

	return sign * result
}

The Go implementation is structurally identical to the Python solution, but there are a few language-specific differences.

Go does not provide built-in arbitrary precision integers for normal int operations, so careful overflow detection is even more important.

Character handling is done using byte comparisons since Go strings are byte slices internally. Digit parsing uses:

digit := int(s[index] - '0')

which converts the ASCII digit into its numeric value.

Unlike Python, Go does not have a built-in isdigit() method, so we explicitly check:

s[index] >= '0' && s[index] <= '9'

Worked Examples

Example 1

Input:

"42"
Step Character Action Result Sign
1 '4' Parse digit 4 1
2 '2' Parse digit 42 1

Final answer:

42

Example 2

Input:

"   -042"
Step Character Action Result Sign
1 ' ' Skip space 0 1
2 ' ' Skip space 0 1
3 ' ' Skip space 0 1
4 '-' Negative sign 0 -1
5 '0' Parse digit 0 -1
6 '4' Parse digit 4 -1
7 '2' Parse digit 42 -1

Final answer:

-42

Example 3

Input:

"1337c0d3"
Step Character Action Result
1 '1' Parse digit 1
2 '3' Parse digit 13
3 '3' Parse digit 133
4 '7' Parse digit 1337
5 'c' Stop parsing 1337

Final answer:

1337

Example 4

Input:

"0-1"
Step Character Action Result
1 '0' Parse digit 0
2 '-' Stop parsing 0

Final answer:

0

Example 5

Input:

"words and 987"
Step Character Action
1 'w' Invalid start

Final answer:

0

Complexity Analysis

Measure Complexity Explanation
Time O(n) Each character is processed at most once
Space O(1) Only a few integer variables are used

The algorithm performs a single left-to-right scan through the string. No additional data structures proportional to input size are allocated. Every operation inside the loop is constant time, so the total runtime grows linearly with the length of the string.

Test Cases

solution = Solution()

assert solution.myAtoi("42") == 42  # basic positive number
assert solution.myAtoi("   -42") == -42  # leading spaces and negative sign
assert solution.myAtoi("4193 with words") == 4193  # stop at non-digit
assert solution.myAtoi("words and 987") == 0  # invalid starting character
assert solution.myAtoi("-91283472332") == -2147483648  # negative overflow
assert solution.myAtoi("91283472332") == 2147483647  # positive overflow
assert solution.myAtoi("+1") == 1  # explicit positive sign
assert solution.myAtoi("00000123") == 123  # leading zeros
assert solution.myAtoi("") == 0  # empty string
assert solution.myAtoi("     ") == 0  # only spaces
assert solution.myAtoi("+-12") == 0  # invalid multiple signs
assert solution.myAtoi("-+12") == 0  # invalid sign order
assert solution.myAtoi("00000-42a1234") == 0  # stop after digits
assert solution.myAtoi("3.14159") == 3  # decimal point stops parsing
assert solution.myAtoi("-2147483648") == -2147483648  # exact INT_MIN
assert solution.myAtoi("2147483647") == 2147483647  # exact INT_MAX
assert solution.myAtoi("2147483648") == 2147483647  # overflow by one
assert solution.myAtoi("-2147483649") == -2147483648  # underflow by one
assert solution.myAtoi("0") == 0  # single zero
assert solution.myAtoi("000") == 0  # multiple zeros
Test Why
"42" Basic valid positive integer
" -42" Leading whitespace handling
"4193 with words" Stops at non-digit characters
"words and 987" Invalid starting character
"-91283472332" Negative overflow clamping
"91283472332" Positive overflow clamping
"+1" Explicit positive sign
"00000123" Leading zeros
"" Empty string
" " String containing only spaces
"+-12" Invalid consecutive signs
"-+12" Invalid sign ordering
"00000-42a1234" Parsing stops after invalid character
"3.14159" Decimal point handling
"-2147483648" Exact minimum boundary
"2147483647" Exact maximum boundary
"2147483648" Overflow beyond maximum
"-2147483649" Underflow beyond minimum
"0" Single digit zero
"000" Multiple leading zeros

Edge Cases

One important edge case is a string containing only whitespace, such as " ". A naive implementation might attempt to access characters after skipping spaces and cause an out-of-bounds error. The implementation avoids this by checking whether the index has reached the end of the string immediately after whitespace removal.

Another tricky case is invalid sign combinations such as "+-12" or "--5". According to the problem rules, parsing only allows one optional sign directly before digits. Once the parser encounters a second non-digit character, digit parsing never begins, so the correct answer is 0. The implementation naturally handles this because digit parsing only starts if the current character is actually a digit.

Overflow handling is also a major source of bugs. Inputs like "91283472332" exceed the 32-bit integer range. If we wait until after constructing the full number, some languages may overflow before we can clamp the value. The implementation prevents this by checking whether the next multiplication and addition would exceed the valid range before updating the result.

A final subtle case is inputs like "0-1" or "123abc456". Once digit parsing begins, the parser must stop permanently at the first non-digit character. It must not resume parsing later digits. The implementation enforces this naturally because the digit-reading loop terminates immediately once a non-digit is encountered.