LeetCode 3451 - Find Invalid IP Addresses

This is a SQL database problem where we must identify all invalid IPv4 addresses that appear in the logs table and count how many times each invalid address occurs. The input table contains one row per server access log.

LeetCode Problem 3451

Difficulty: 🔴 Hard
Topics: Database

Solution

Problem Understanding

This is a SQL database problem where we must identify all invalid IPv4 addresses that appear in the logs table and count how many times each invalid address occurs.

The input table contains one row per server access log. Each row includes a unique log_id, an IP address stored as a string in the ip column, and an HTTP status_code. The status code is irrelevant for this task. Our focus is entirely on validating the IP address string.

A valid IPv4 address must contain exactly four octets separated by periods (.). Each octet must satisfy two requirements:

  1. Its numeric value must be between 0 and 255, inclusive.
  2. It must not contain leading zeros unless the octet itself is exactly "0".

The problem defines three categories of invalid addresses:

  • Any octet has a value greater than 255.
  • Any octet contains leading zeros, such as "001" or "01".
  • The address does not contain exactly four octets.

After identifying invalid addresses, we must group identical invalid IP strings together and count their occurrences. The output should contain:

  • ip
  • invalid_count

The results must be sorted by:

  1. invalid_count descending
  2. ip descending

The key challenge is correctly validating the IP string using SQL string operations and regular expressions.

Important Edge Cases

An implementation can easily make mistakes on several corner cases:

  • 256.1.2.3 is invalid because one octet exceeds 255.
  • 192.168.001.1 is invalid because "001" contains leading zeros.
  • 192.168.1 is invalid because it contains only three octets.
  • 1.2.3.4.5 is invalid because it contains five octets.
  • 0.0.0.0 is valid because zero is allowed.
  • 255.255.255.255 is valid because 255 is the maximum allowed value.
  • 00.0.0.0 is invalid because "00" contains leading zeros.

The problem guarantees that IP addresses are stored as strings, so validation must be performed through string processing rather than numeric columns.

Approaches

Brute Force Approach

A brute force strategy would parse every IP address character by character, manually split the address into octets, validate each octet, and then maintain a count of invalid addresses.

Conceptually, this approach is straightforward. For every row, we would:

  1. Split the string on periods.
  2. Verify there are exactly four components.
  3. Check each component for leading zeros.
  4. Convert each component to a number and verify it does not exceed 255.
  5. If any validation fails, count the IP.

This produces correct results because it directly implements the IPv4 validation rules.

However, database systems are optimized for set-based operations. Implementing extensive procedural parsing is unnecessarily complex and inefficient compared to SQL's built-in regular expression functions and string operations.

Optimal Approach

The key observation is that an IPv4 address is valid if and only if it matches a precise pattern.

Instead of validating every rule separately through procedural logic, we can define the structure of a valid IPv4 address using a regular expression.

A valid octet is:

  • 0
  • 1 through 9
  • 10 through 99
  • 100 through 199
  • 200 through 249
  • 250 through 255

This can be expressed as:

(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])

Repeating this pattern exactly four times with dots between octets gives a complete valid IPv4 regex.

Any IP address that does not match this regex is invalid.

Once invalid rows are identified, we simply group by ip, count occurrences, and sort according to the required order.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(N × L) O(1) Manually parses every IP address
Optimal O(N × L) O(1) Uses regex validation directly in SQL

Here:

  • N is the number of rows.
  • L is the maximum IP string length.

Although the asymptotic complexity is similar, the regex-based solution is significantly cleaner and leverages database optimizations.

Algorithm Walkthrough

  1. Define a regular expression representing a valid IPv4 address.
  2. Scan every row in the logs table.
  3. For each row, check whether the IP string matches the valid IPv4 pattern.
  4. If the IP does not match the pattern, classify it as invalid.
  5. Group all invalid rows by their exact IP string.
  6. Count how many times each invalid IP appears.
  7. Sort the grouped results by:
  • invalid_count descending
  • ip descending
  1. Return the resulting table.

Why it works

The regular expression exactly encodes the definition of a valid IPv4 address:

  • It requires exactly four octets.
  • Each octet must be between 0 and 255.
  • Leading zeros are disallowed because values with multiple digits must begin with a nonzero digit.

Therefore, every valid IPv4 address matches the pattern, and every address violating any rule fails to match. By selecting only non-matching rows, we obtain precisely the invalid IP addresses required by the problem.

SQL Solution

SELECT
    ip,
    COUNT(*) AS invalid_count
FROM logs
WHERE ip NOT REGEXP '^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$'
GROUP BY ip
ORDER BY invalid_count DESC, ip DESC;

Implementation Explanation

The regular expression defines a valid IPv4 address.

The octet pattern:

25[0-5]

matches values from 250 to 255.

The pattern:

2[0-4][0-9]

matches values from 200 to 249.

The pattern:

1[0-9]{2}

matches values from 100 to 199.

The pattern:

[1-9]?[0-9]

matches values from 0 to 99 while preventing leading zeros in multi-digit numbers.

The expression:

((octet)\.){3}(octet)

requires exactly four octets separated by three periods.

Rows whose IP does not satisfy this pattern are invalid. We then group identical invalid IPs and count their occurrences before applying the required ordering.

Worked Example

Consider the sample input:

log_id ip status_code
1 192.168.1.1 200
2 256.1.2.3 404
3 192.168.001.1 200
4 192.168.1.1 200
5 192.168.1 500
6 256.1.2.3 404
7 192.168.001.1 200

Validation Phase

IP Valid? Reason
192.168.1.1 Yes Four octets, all between 0 and 255
256.1.2.3 No 256 > 255
192.168.001.1 No Leading zeros
192.168.1.1 Yes Valid
192.168.1 No Only three octets
256.1.2.3 No 256 > 255
192.168.001.1 No Leading zeros

Grouping Phase

IP Count
256.1.2.3 2
192.168.001.1 2
192.168.1 1

Sorting Phase

Sorted by count descending, then IP descending:

ip invalid_count
256.1.2.3 2
192.168.001.1 2
192.168.1 1

This matches the expected output.

Complexity Analysis

Measure Complexity Explanation
Time O(N × L) Each IP string is matched against a regex of constant size
Space O(1) Aside from grouping performed by the database engine

The regex itself has constant size, so the cost of validating one IP is proportional to the IP string length. Since IPv4 strings are very short, the practical running time is essentially linear in the number of rows.

Test Cases

The following examples illustrate the expected behavior:

# Sample example
assert is_invalid("256.1.2.3")          # octet > 255
assert is_invalid("192.168.001.1")      # leading zeros
assert is_invalid("192.168.1")          # only 3 octets

# Valid boundary values
assert not is_invalid("0.0.0.0")        # minimum valid address
assert not is_invalid("255.255.255.255")# maximum valid address

# More than four octets
assert is_invalid("1.2.3.4.5")          # 5 octets

# Leading zeros
assert is_invalid("01.2.3.4")           # leading zero
assert is_invalid("1.02.3.4")           # leading zero
assert is_invalid("1.2.003.4")          # leading zero

# Values above 255
assert is_invalid("300.1.1.1")          # first octet invalid
assert is_invalid("1.300.1.1")          # second octet invalid
assert is_invalid("1.1.300.1")          # third octet invalid
assert is_invalid("1.1.1.300")          # fourth octet invalid

# Edge valid cases
assert not is_invalid("1.2.3.4")
assert not is_invalid("10.20.30.40")
assert not is_invalid("99.99.99.99")

Test Summary

Test Why
256.1.2.3 Verifies upper bound enforcement
192.168.001.1 Verifies leading zero detection
192.168.1 Verifies insufficient octets
1.2.3.4.5 Verifies excessive octets
0.0.0.0 Verifies minimum valid value
255.255.255.255 Verifies maximum valid value
01.2.3.4 Verifies leading zeros in first octet
300.1.1.1 Verifies large invalid octet
1.2.3.4 Verifies ordinary valid address

Edge Cases

Leading Zeros

An address such as 192.168.001.1 may appear numerically valid if octets are converted to integers. A naive solution that only checks numeric ranges would incorrectly accept it. The regex prevents this because multi-digit octets must begin with a nonzero digit.

Incorrect Number of Octets

Addresses like 192.168.1 or 1.2.3.4.5 violate the IPv4 structure requirement. The regex explicitly requires exactly three dots and exactly four octets, ensuring these addresses are rejected.

Values Greater Than 255

Addresses such as 256.1.2.3 or 999.0.0.1 contain octets outside the allowed range. The octet pattern only permits values from 0 through 255, so any larger value fails validation automatically.

Boundary Values

The addresses 0.0.0.0 and 255.255.255.255 are both valid. A poorly designed regex may accidentally exclude one of these boundaries. The chosen pattern explicitly includes both endpoints of the valid range, guaranteeing correct handling.