LeetCode 3451 - Find Invalid IP Addresses
This is a SQL database problem where we must identify all invalid IPv4 addresses that appear in the logs table and count how many times each invalid address occurs. The input table contains one row per server access log.
Difficulty: 🔴 Hard
Topics: Database
Solution
Problem Understanding
This is a SQL database problem where we must identify all invalid IPv4 addresses that appear in the logs table and count how many times each invalid address occurs.
The input table contains one row per server access log. Each row includes a unique log_id, an IP address stored as a string in the ip column, and an HTTP status_code. The status code is irrelevant for this task. Our focus is entirely on validating the IP address string.
A valid IPv4 address must contain exactly four octets separated by periods (.). Each octet must satisfy two requirements:
- Its numeric value must be between
0and255, inclusive. - It must not contain leading zeros unless the octet itself is exactly
"0".
The problem defines three categories of invalid addresses:
- Any octet has a value greater than
255. - Any octet contains leading zeros, such as
"001"or"01". - The address does not contain exactly four octets.
After identifying invalid addresses, we must group identical invalid IP strings together and count their occurrences. The output should contain:
ipinvalid_count
The results must be sorted by:
invalid_countdescendingipdescending
The key challenge is correctly validating the IP string using SQL string operations and regular expressions.
Important Edge Cases
An implementation can easily make mistakes on several corner cases:
256.1.2.3is invalid because one octet exceeds255.192.168.001.1is invalid because"001"contains leading zeros.192.168.1is invalid because it contains only three octets.1.2.3.4.5is invalid because it contains five octets.0.0.0.0is valid because zero is allowed.255.255.255.255is valid because255is the maximum allowed value.00.0.0.0is invalid because"00"contains leading zeros.
The problem guarantees that IP addresses are stored as strings, so validation must be performed through string processing rather than numeric columns.
Approaches
Brute Force Approach
A brute force strategy would parse every IP address character by character, manually split the address into octets, validate each octet, and then maintain a count of invalid addresses.
Conceptually, this approach is straightforward. For every row, we would:
- Split the string on periods.
- Verify there are exactly four components.
- Check each component for leading zeros.
- Convert each component to a number and verify it does not exceed
255. - If any validation fails, count the IP.
This produces correct results because it directly implements the IPv4 validation rules.
However, database systems are optimized for set-based operations. Implementing extensive procedural parsing is unnecessarily complex and inefficient compared to SQL's built-in regular expression functions and string operations.
Optimal Approach
The key observation is that an IPv4 address is valid if and only if it matches a precise pattern.
Instead of validating every rule separately through procedural logic, we can define the structure of a valid IPv4 address using a regular expression.
A valid octet is:
01through910through99100through199200through249250through255
This can be expressed as:
(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])
Repeating this pattern exactly four times with dots between octets gives a complete valid IPv4 regex.
Any IP address that does not match this regex is invalid.
Once invalid rows are identified, we simply group by ip, count occurrences, and sort according to the required order.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(N × L) | O(1) | Manually parses every IP address |
| Optimal | O(N × L) | O(1) | Uses regex validation directly in SQL |
Here:
Nis the number of rows.Lis the maximum IP string length.
Although the asymptotic complexity is similar, the regex-based solution is significantly cleaner and leverages database optimizations.
Algorithm Walkthrough
- Define a regular expression representing a valid IPv4 address.
- Scan every row in the
logstable. - For each row, check whether the IP string matches the valid IPv4 pattern.
- If the IP does not match the pattern, classify it as invalid.
- Group all invalid rows by their exact IP string.
- Count how many times each invalid IP appears.
- Sort the grouped results by:
invalid_countdescendingipdescending
- Return the resulting table.
Why it works
The regular expression exactly encodes the definition of a valid IPv4 address:
- It requires exactly four octets.
- Each octet must be between
0and255. - Leading zeros are disallowed because values with multiple digits must begin with a nonzero digit.
Therefore, every valid IPv4 address matches the pattern, and every address violating any rule fails to match. By selecting only non-matching rows, we obtain precisely the invalid IP addresses required by the problem.
SQL Solution
SELECT
ip,
COUNT(*) AS invalid_count
FROM logs
WHERE ip NOT REGEXP '^((25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])\\.){3}(25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9]?[0-9])$'
GROUP BY ip
ORDER BY invalid_count DESC, ip DESC;
Implementation Explanation
The regular expression defines a valid IPv4 address.
The octet pattern:
25[0-5]
matches values from 250 to 255.
The pattern:
2[0-4][0-9]
matches values from 200 to 249.
The pattern:
1[0-9]{2}
matches values from 100 to 199.
The pattern:
[1-9]?[0-9]
matches values from 0 to 99 while preventing leading zeros in multi-digit numbers.
The expression:
((octet)\.){3}(octet)
requires exactly four octets separated by three periods.
Rows whose IP does not satisfy this pattern are invalid. We then group identical invalid IPs and count their occurrences before applying the required ordering.
Worked Example
Consider the sample input:
| log_id | ip | status_code |
|---|---|---|
| 1 | 192.168.1.1 | 200 |
| 2 | 256.1.2.3 | 404 |
| 3 | 192.168.001.1 | 200 |
| 4 | 192.168.1.1 | 200 |
| 5 | 192.168.1 | 500 |
| 6 | 256.1.2.3 | 404 |
| 7 | 192.168.001.1 | 200 |
Validation Phase
| IP | Valid? | Reason |
|---|---|---|
| 192.168.1.1 | Yes | Four octets, all between 0 and 255 |
| 256.1.2.3 | No | 256 > 255 |
| 192.168.001.1 | No | Leading zeros |
| 192.168.1.1 | Yes | Valid |
| 192.168.1 | No | Only three octets |
| 256.1.2.3 | No | 256 > 255 |
| 192.168.001.1 | No | Leading zeros |
Grouping Phase
| IP | Count |
|---|---|
| 256.1.2.3 | 2 |
| 192.168.001.1 | 2 |
| 192.168.1 | 1 |
Sorting Phase
Sorted by count descending, then IP descending:
| ip | invalid_count |
|---|---|
| 256.1.2.3 | 2 |
| 192.168.001.1 | 2 |
| 192.168.1 | 1 |
This matches the expected output.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(N × L) | Each IP string is matched against a regex of constant size |
| Space | O(1) | Aside from grouping performed by the database engine |
The regex itself has constant size, so the cost of validating one IP is proportional to the IP string length. Since IPv4 strings are very short, the practical running time is essentially linear in the number of rows.
Test Cases
The following examples illustrate the expected behavior:
# Sample example
assert is_invalid("256.1.2.3") # octet > 255
assert is_invalid("192.168.001.1") # leading zeros
assert is_invalid("192.168.1") # only 3 octets
# Valid boundary values
assert not is_invalid("0.0.0.0") # minimum valid address
assert not is_invalid("255.255.255.255")# maximum valid address
# More than four octets
assert is_invalid("1.2.3.4.5") # 5 octets
# Leading zeros
assert is_invalid("01.2.3.4") # leading zero
assert is_invalid("1.02.3.4") # leading zero
assert is_invalid("1.2.003.4") # leading zero
# Values above 255
assert is_invalid("300.1.1.1") # first octet invalid
assert is_invalid("1.300.1.1") # second octet invalid
assert is_invalid("1.1.300.1") # third octet invalid
assert is_invalid("1.1.1.300") # fourth octet invalid
# Edge valid cases
assert not is_invalid("1.2.3.4")
assert not is_invalid("10.20.30.40")
assert not is_invalid("99.99.99.99")
Test Summary
| Test | Why |
|---|---|
| 256.1.2.3 | Verifies upper bound enforcement |
| 192.168.001.1 | Verifies leading zero detection |
| 192.168.1 | Verifies insufficient octets |
| 1.2.3.4.5 | Verifies excessive octets |
| 0.0.0.0 | Verifies minimum valid value |
| 255.255.255.255 | Verifies maximum valid value |
| 01.2.3.4 | Verifies leading zeros in first octet |
| 300.1.1.1 | Verifies large invalid octet |
| 1.2.3.4 | Verifies ordinary valid address |
Edge Cases
Leading Zeros
An address such as 192.168.001.1 may appear numerically valid if octets are converted to integers. A naive solution that only checks numeric ranges would incorrectly accept it. The regex prevents this because multi-digit octets must begin with a nonzero digit.
Incorrect Number of Octets
Addresses like 192.168.1 or 1.2.3.4.5 violate the IPv4 structure requirement. The regex explicitly requires exactly three dots and exactly four octets, ensuring these addresses are rejected.
Values Greater Than 255
Addresses such as 256.1.2.3 or 999.0.0.1 contain octets outside the allowed range. The octet pattern only permits values from 0 through 255, so any larger value fails validation automatically.
Boundary Values
The addresses 0.0.0.0 and 255.255.255.255 are both valid. A poorly designed regex may accidentally exclude one of these boundaries. The chosen pattern explicitly includes both endpoints of the valid range, guaranteeing correct handling.