LeetCode 960 - Delete Columns to Make Sorted III

This problem asks us to find the minimum number of columns to delete from a set of equal-length strings so that each individual string becomes lexicographically sorted.

LeetCode Problem 960

Difficulty: 🔴 Hard
Topics: Array, String, Dynamic Programming

Solution

Problem Understanding

This problem asks us to find the minimum number of columns to delete from a set of equal-length strings so that each individual string becomes lexicographically sorted. In other words, we are allowed to remove some positions (columns) across all strings, and the goal is to ensure that each string, independently, is in non-decreasing alphabetical order from left to right.

The input consists of an array strs with n strings, each of length m. The output is an integer representing the smallest number of column deletions needed to achieve the goal. Constraints indicate that both n and m can go up to 100, which suggests that an O(n * m²) algorithm could be acceptable but O(2^m) brute force is infeasible.

Important edge cases include strings that are already sorted (requiring zero deletions), strings in reverse order (requiring deletion of all but one column), and cases where deleting one column enables multiple strings to become sorted.

Approaches

The brute-force approach would attempt every possible combination of column deletions and check whether the resulting strings are sorted. For m columns, there are 2^m possible deletion sets, and for each set, we would check all n strings of length up to m. This is clearly infeasible for m up to 100 because 2^100 is astronomically large.

The key observation for a better solution is to treat the problem as a variant of the Longest Increasing Subsequence (LIS) problem, but in columns. Instead of thinking about which columns to delete, we think about which columns to keep. We can iterate over the columns and track the length of the longest sequence of columns that keeps all rows lexicographically sorted. For each column j, we check all previous columns i < j. If for every string strs[k], strs[k][i] <= strs[k][j], then column j can follow column i in the increasing sequence. This results in a dynamic programming solution.

Approach Time Complexity Space Complexity Notes
Brute Force O(2^m * n * m) O(m) Checks all subsets of columns, infeasible for m = 100
Optimal O(m² * n) O(m) Uses dynamic programming on columns to find LIS of columns to keep

Algorithm Walkthrough

  1. Initialize a DP array dp of length m (number of columns), where dp[j] represents the length of the longest sequence of columns ending at column j that keeps all strings sorted.
  2. Set all entries of dp to 1, since each column alone can always be kept.
  3. Iterate over columns j from left to right.
  4. For each j, iterate over all previous columns i < j.
  5. For each pair (i, j), check if for every string k, strs[k][i] <= strs[k][j]. If true, update dp[j] = max(dp[j], dp[i] + 1).
  6. After filling the DP array, the length of the longest sequence of columns to keep is max(dp).
  7. The minimum deletions required is m - max(dp).

Why it works: The DP array guarantees that for any column sequence represented by dp[j], the sequence of characters in all rows is non-decreasing. By maximizing the columns kept, we minimize the columns deleted.

Python Solution

from typing import List

class Solution:
    def minDeletionSize(self, strs: List[str]) -> int:
        n = len(strs)
        m = len(strs[0])
        dp = [1] * m  # Each column can be a sequence by itself
        
        for j in range(m):
            for i in range(j):
                if all(strs[k][i] <= strs[k][j] for k in range(n)):
                    dp[j] = max(dp[j], dp[i] + 1)
        
        return m - max(dp)

The code initializes a DP array where each entry represents the longest sorted sequence ending at that column. It iterates over each pair of columns and checks if keeping column j after i maintains row-wise sorting. The final answer is the number of columns not part of the longest sequence.

Go Solution

func minDeletionSize(strs []string) int {
    n := len(strs)
    m := len(strs[0])
    dp := make([]int, m)
    for j := range dp {
        dp[j] = 1
    }
    
    for j := 0; j < m; j++ {
        for i := 0; i < j; i++ {
            valid := true
            for k := 0; k < n; k++ {
                if strs[k][i] > strs[k][j] {
                    valid = false
                    break
                }
            }
            if valid {
                if dp[i]+1 > dp[j] {
                    dp[j] = dp[i] + 1
                }
            }
        }
    }
    
    maxLen := 0
    for _, val := range dp {
        if val > maxLen {
            maxLen = val
        }
    }
    
    return m - maxLen
}

The Go implementation mirrors the Python logic. The main differences are explicit slice initialization, explicit variable for valid, and range-based loops for computing the max value.

Worked Examples

Example 1: strs = ["babca","bbazb"]

DP initialization: [1,1,1,1,1]

Iteration details:

i j Condition (all rows i<=j) dp[j] update dp array
0 1 false - [1,1,1,1,1]
0 2 true dp[2]=dp[0]+1=2 [1,1,2,1,1]
0 3 true dp[3]=2 [1,1,2,2,1]
0 4 false - [1,1,2,2,1]
1 2 false - [1,1,2,2,1]
1 3 false - [1,1,2,2,1]
1 4 false - [1,1,2,2,1]
2 3 false - [1,1,2,2,1]
2 4 false - [1,1,2,2,1]
3 4 false - [1,1,2,2,1]

Longest sequence to keep: 2 → Minimum deletions = 5 - 2 = 3.

Complexity Analysis

Measure Complexity Explanation
Time O(n * m²) For each of the m² column pairs, we check all n rows
Space O(m) DP array of length m

The time complexity is feasible because n and m are at most 100, giving at most 1,000,000 iterations.

Test Cases

# Provided examples
assert Solution().minDeletionSize(["babca","bbazb"]) == 3  # Example 1
assert Solution().minDeletionSize(["edcba"]) == 4         # Example 2
assert Solution().minDeletionSize(["ghi","def","abc"]) == 0  # Example 3

# Edge and additional cases
assert Solution().minDeletionSize(["a"]) == 0             # Single column, already sorted
assert Solution().minDeletionSize(["zyx"]) == 2          # Reverse order single row
assert Solution().minDeletionSize(["abc","abc","abc"]) == 0  # All rows already sorted
assert Solution().minDeletionSize(["bac","abc","cab"]) == 2  # Mixed rows
assert Solution().minDeletionSize(["a"*100]*100) == 0    # Large input, all identical letters
Test Why
["babca","bbazb"] Checks typical scenario with mixed deletions
["edcba"] Single row in reverse order
["ghi","def","abc"] Multiple rows already sorted
["a"] Minimal input size
["zyx"] Single row, multiple columns descending
["abc","abc","abc"] All rows identical and sorted
["bac","abc","cab"] Rows partially sorted, needs selective deletions
["a"*100]*100 Max constraints stress test

Edge Cases

One important edge case is a single string of multiple columns that is strictly descending. Without careful checking, a naive solution might not delete enough columns. Our DP correctly identifies that only one column can remain.

Another edge case is when all rows are identical. Here, no deletions are needed regardless of row count or column length. The DP approach handles this naturally because all comparisons pass.

Finally, consider large inputs where each row has identical characters repeated many times. The algorithm must handle n = 100 and `m = 100