LeetCode 960 - Delete Columns to Make Sorted III
This problem asks us to find the minimum number of columns to delete from a set of equal-length strings so that each individual string becomes lexicographically sorted.
Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming
Solution
Problem Understanding
This problem asks us to find the minimum number of columns to delete from a set of equal-length strings so that each individual string becomes lexicographically sorted. In other words, we are allowed to remove some positions (columns) across all strings, and the goal is to ensure that each string, independently, is in non-decreasing alphabetical order from left to right.
The input consists of an array strs with n strings, each of length m. The output is an integer representing the smallest number of column deletions needed to achieve the goal. Constraints indicate that both n and m can go up to 100, which suggests that an O(n * m²) algorithm could be acceptable but O(2^m) brute force is infeasible.
Important edge cases include strings that are already sorted (requiring zero deletions), strings in reverse order (requiring deletion of all but one column), and cases where deleting one column enables multiple strings to become sorted.
Approaches
The brute-force approach would attempt every possible combination of column deletions and check whether the resulting strings are sorted. For m columns, there are 2^m possible deletion sets, and for each set, we would check all n strings of length up to m. This is clearly infeasible for m up to 100 because 2^100 is astronomically large.
The key observation for a better solution is to treat the problem as a variant of the Longest Increasing Subsequence (LIS) problem, but in columns. Instead of thinking about which columns to delete, we think about which columns to keep. We can iterate over the columns and track the length of the longest sequence of columns that keeps all rows lexicographically sorted. For each column j, we check all previous columns i < j. If for every string strs[k], strs[k][i] <= strs[k][j], then column j can follow column i in the increasing sequence. This results in a dynamic programming solution.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^m * n * m) | O(m) | Checks all subsets of columns, infeasible for m = 100 |
| Optimal | O(m² * n) | O(m) | Uses dynamic programming on columns to find LIS of columns to keep |
Algorithm Walkthrough
- Initialize a DP array
dpof lengthm(number of columns), wheredp[j]represents the length of the longest sequence of columns ending at columnjthat keeps all strings sorted. - Set all entries of
dpto 1, since each column alone can always be kept. - Iterate over columns
jfrom left to right. - For each
j, iterate over all previous columnsi < j. - For each pair
(i, j), check if for every stringk,strs[k][i] <= strs[k][j]. If true, updatedp[j] = max(dp[j], dp[i] + 1). - After filling the DP array, the length of the longest sequence of columns to keep is
max(dp). - The minimum deletions required is
m - max(dp).
Why it works: The DP array guarantees that for any column sequence represented by dp[j], the sequence of characters in all rows is non-decreasing. By maximizing the columns kept, we minimize the columns deleted.
Python Solution
from typing import List
class Solution:
def minDeletionSize(self, strs: List[str]) -> int:
n = len(strs)
m = len(strs[0])
dp = [1] * m # Each column can be a sequence by itself
for j in range(m):
for i in range(j):
if all(strs[k][i] <= strs[k][j] for k in range(n)):
dp[j] = max(dp[j], dp[i] + 1)
return m - max(dp)
The code initializes a DP array where each entry represents the longest sorted sequence ending at that column. It iterates over each pair of columns and checks if keeping column j after i maintains row-wise sorting. The final answer is the number of columns not part of the longest sequence.
Go Solution
func minDeletionSize(strs []string) int {
n := len(strs)
m := len(strs[0])
dp := make([]int, m)
for j := range dp {
dp[j] = 1
}
for j := 0; j < m; j++ {
for i := 0; i < j; i++ {
valid := true
for k := 0; k < n; k++ {
if strs[k][i] > strs[k][j] {
valid = false
break
}
}
if valid {
if dp[i]+1 > dp[j] {
dp[j] = dp[i] + 1
}
}
}
}
maxLen := 0
for _, val := range dp {
if val > maxLen {
maxLen = val
}
}
return m - maxLen
}
The Go implementation mirrors the Python logic. The main differences are explicit slice initialization, explicit variable for valid, and range-based loops for computing the max value.
Worked Examples
Example 1: strs = ["babca","bbazb"]
DP initialization: [1,1,1,1,1]
Iteration details:
| i | j | Condition (all rows i<=j) | dp[j] update | dp array |
|---|---|---|---|---|
| 0 | 1 | false | - | [1,1,1,1,1] |
| 0 | 2 | true | dp[2]=dp[0]+1=2 | [1,1,2,1,1] |
| 0 | 3 | true | dp[3]=2 | [1,1,2,2,1] |
| 0 | 4 | false | - | [1,1,2,2,1] |
| 1 | 2 | false | - | [1,1,2,2,1] |
| 1 | 3 | false | - | [1,1,2,2,1] |
| 1 | 4 | false | - | [1,1,2,2,1] |
| 2 | 3 | false | - | [1,1,2,2,1] |
| 2 | 4 | false | - | [1,1,2,2,1] |
| 3 | 4 | false | - | [1,1,2,2,1] |
Longest sequence to keep: 2 → Minimum deletions = 5 - 2 = 3.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * m²) | For each of the m² column pairs, we check all n rows |
| Space | O(m) | DP array of length m |
The time complexity is feasible because n and m are at most 100, giving at most 1,000,000 iterations.
Test Cases
# Provided examples
assert Solution().minDeletionSize(["babca","bbazb"]) == 3 # Example 1
assert Solution().minDeletionSize(["edcba"]) == 4 # Example 2
assert Solution().minDeletionSize(["ghi","def","abc"]) == 0 # Example 3
# Edge and additional cases
assert Solution().minDeletionSize(["a"]) == 0 # Single column, already sorted
assert Solution().minDeletionSize(["zyx"]) == 2 # Reverse order single row
assert Solution().minDeletionSize(["abc","abc","abc"]) == 0 # All rows already sorted
assert Solution().minDeletionSize(["bac","abc","cab"]) == 2 # Mixed rows
assert Solution().minDeletionSize(["a"*100]*100) == 0 # Large input, all identical letters
| Test | Why |
|---|---|
["babca","bbazb"] |
Checks typical scenario with mixed deletions |
["edcba"] |
Single row in reverse order |
["ghi","def","abc"] |
Multiple rows already sorted |
["a"] |
Minimal input size |
["zyx"] |
Single row, multiple columns descending |
["abc","abc","abc"] |
All rows identical and sorted |
["bac","abc","cab"] |
Rows partially sorted, needs selective deletions |
["a"*100]*100 |
Max constraints stress test |
Edge Cases
One important edge case is a single string of multiple columns that is strictly descending. Without careful checking, a naive solution might not delete enough columns. Our DP correctly identifies that only one column can remain.
Another edge case is when all rows are identical. Here, no deletions are needed regardless of row count or column length. The DP approach handles this naturally because all comparisons pass.
Finally, consider large inputs where each row has identical characters repeated many times. The algorithm must handle n = 100 and `m = 100