LeetCode 955 - Delete Columns to Make Sorted II

Difficulty: 🟡 Medium
Topics: Array, String, Greedy

Solution

Problem Understanding

This problem asks us to remove the minimum number of columns from a list of equal length strings so that the resulting array of strings becomes lexicographically sorted.

The important detail is that when we delete a column, we delete that character position from every string simultaneously. We cannot selectively delete characters from individual strings.

Lexicographic order is the same ordering used in dictionaries. Given two strings a and b:

If the first differing character in a is smaller than the corresponding character in b, then a < b
If all compared characters are equal and one string ends first, the shorter string is smaller
Equal strings are also allowed because the requirement is <=, not strictly <

The input consists of:

n strings
Every string has the same length m
1 <= n <= 100
1 <= m <= 100

The goal is to return the minimum number of deleted columns required so that:

strs[0] <= strs[1] <= strs[2] <= ... <= strs[n-1]

A key observation is that lexicographic comparison happens from left to right. Once two adjacent strings are already determined to be in correct order because of an earlier column, later columns no longer matter for that pair.

For example:

["abx", "acy"]

At column 1:

'b' < 'c'

The ordering between these two strings is already fixed. Whatever appears later cannot invalidate it.

This property is the foundation of the greedy solution.

Some important edge cases include:

The strings may already be sorted, requiring 0 deletions
Every column may need deletion
Many rows may share equal prefixes
A column may be valid for some adjacent pairs but invalid for others
Duplicate strings are allowed
A single bad comparison between adjacent rows forces deletion of that column

Because n and m are both at most 100, an O(n * m) or O(n * m^2) solution is completely acceptable.

Approaches

Brute Force Approach

A brute force solution would try every possible subset of columns to delete.

There are m columns, and each column can either be kept or deleted. That creates:

2^m

possible subsets.

For each subset, we would:

Construct the transformed strings
Check whether the resulting array is lexicographically sorted
Track the minimum deletions among all valid subsets

This approach is guaranteed to find the optimal answer because it examines every possible deletion configuration.

However, the complexity becomes exponential.

If m = 100, then:

2^100

is astronomically large and completely infeasible.

We need a more intelligent strategy.

Optimal Greedy Approach

The key insight is that lexicographic ordering is determined progressively from left to right.

When comparing adjacent rows:

strs[i]
strs[i+1]

there are three possibilities at a column:

strs[i][col] < strs[i+1][col]
Their order becomes permanently resolved
strs[i][col] == strs[i+1][col]
We still need future columns to determine ordering
strs[i][col] > strs[i+1][col]
This column makes the ordering invalid and must be deleted

We process columns greedily from left to right.

We maintain which adjacent row pairs are already confirmed to be correctly ordered. Once a pair is confirmed, we never need to compare that pair again.

For each column:

If it violates ordering for any unresolved pair, we delete it
Otherwise, we keep it and update newly resolved pairs

This works because deleting a column later cannot repair an ordering violation caused by an earlier kept column.

Approach Comparison

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(2^m × n × m)	O(n × m)	Tries every subset of deleted columns
Optimal Greedy	O(n × m)	O(n)	Processes columns left to right while tracking resolved pairs

Algorithm Walkthrough

Step 1: Initialize tracking structures

We create:

deletions to count removed columns
A boolean array sorted_pairs

sorted_pairs[i] means:

strs[i] < strs[i+1]

has already been permanently established by an earlier kept column.

Initially, every pair is unresolved.

Step 2: Process columns from left to right

For each column:

Check whether keeping this column would violate ordering for any unresolved adjacent pair
If even one unresolved pair has:

strs[i][col] > strs[i+1][col]

then this column must be deleted

Why?

Because lexicographic comparison uses the first differing character. If we keep a bad column, the final ordering becomes impossible to fix later.

Step 3: Delete invalid columns

If the column is invalid:

Increment deletions
Skip updating any pair states
Move to the next column

We pretend this column never existed.

Step 4: Update resolved pairs for valid columns

If the column is safe to keep:

For every unresolved adjacent pair:

strs[i][col] < strs[i+1][col]

we mark that pair as resolved.

Future columns no longer matter for that pair.

Step 5: Continue until all columns are processed

After processing all columns, return the total deletions.

Why it works

The greedy strategy is correct because lexicographic order depends on the first differing character.

If a kept column creates:

strs[i][col] > strs[i+1][col]

for an unresolved pair, no future column can repair that violation. Therefore, the column must be deleted immediately.

Conversely, if a column safely maintains ordering, keeping it can only help because it may permanently resolve additional row pairs.

By greedily deleting only necessary columns, we achieve the minimum possible deletions.

Python Solution

from typing import List

class Solution:
    def minDeletionSize(self, strs: List[str]) -> int:
        rows = len(strs)
        cols = len(strs[0])

        deletions = 0

        # sorted_pairs[i] means:
        # strs[i] < strs[i + 1] is already guaranteed
        sorted_pairs = [False] * (rows - 1)

        for col in range(cols):

            # Check whether this column causes an invalid ordering
            should_delete = False

            for row in range(rows - 1):
                if not sorted_pairs[row]:
                    if strs[row][col] > strs[row + 1][col]:
                        should_delete = True
                        break

            # Delete the column if invalid
            if should_delete:
                deletions += 1
                continue

            # Otherwise, update resolved pairs
            for row in range(rows - 1):
                if not sorted_pairs[row]:
                    if strs[row][col] < strs[row + 1][col]:
                        sorted_pairs[row] = True

        return deletions

The implementation begins by determining the number of rows and columns.

The sorted_pairs array is the core optimization. Instead of repeatedly comparing fully constructed strings, we track which adjacent pairs already have confirmed ordering.

For every column, we first determine whether the column is safe to keep. We only compare unresolved pairs because resolved pairs no longer matter.

If we detect an ordering violation, we increment the deletion count and skip the column entirely.

If the column is valid, we use it to resolve additional pairs. Whenever:

strs[row][col] < strs[row + 1][col]

the relative order between those two rows becomes permanently fixed.

At the end, the deletion count contains the minimum number of removed columns.

Go Solution

func minDeletionSize(strs []string) int {
	rows := len(strs)
	cols := len(strs[0])

	deletions := 0

	// sortedPairs[i] means:
	// strs[i] < strs[i+1] is already guaranteed
	sortedPairs := make([]bool, rows-1)

	for col := 0; col < cols; col++ {

		shouldDelete := false

		// Check whether this column breaks ordering
		for row := 0; row < rows-1; row++ {
			if !sortedPairs[row] {
				if strs[row][col] > strs[row+1][col] {
					shouldDelete = true
					break
				}
			}
		}

		// Delete invalid column
		if shouldDelete {
			deletions++
			continue
		}

		// Update resolved pairs
		for row := 0; row < rows-1; row++ {
			if !sortedPairs[row] {
				if strs[row][col] < strs[row+1][col] {
					sortedPairs[row] = true
				}
			}
		}
	}

	return deletions
}

The Go implementation follows the same logic as the Python version.

Go strings support byte indexing directly because the input only contains lowercase English letters. That allows character comparisons like:

strs[row][col]

without additional rune conversion.

The sortedPairs slice stores the resolved adjacent row relationships exactly as in Python.

No special handling for integer overflow is necessary because all values are very small relative to Go's integer limits.

Worked Examples

Example 1

Input: ["ca","bb","ac"]

Initial state:

Variable	Value
deletions	0
sorted_pairs	[False, False]

We process column 0.

Characters:

Pair	Comparison
"c" vs "b"	c > b
"b" vs "a"	b > a

The first comparison already violates ordering.

Delete column 0.

Variable	Value
deletions	1
sorted_pairs	[False, False]

Now process column 1.

Characters:

Pair	Comparison
"a" vs "b"	a < b
"b" vs "c"	b < c

No violations occur.

Mark both pairs resolved.

Variable	Value
deletions	1
sorted_pairs	[True, True]

Final answer:

Example 2

Input: ["xc","yb","za"]

Initial state:

Variable	Value
deletions	0
sorted_pairs	[False, False]

Process column 0.

Pair	Comparison
"x" vs "y"	x < y
"y" vs "z"	y < z

Both pairs become resolved.

Variable	Value
deletions	0
sorted_pairs	[True, True]

Process column 1.

All pairs are already resolved, so this column cannot create problems.

Final answer:

Example 3

Input: ["zyx","wvu","tsr"]

Initial state:

Variable	Value
deletions	0
sorted_pairs	[False, False]

Process column 0.

Pair	Comparison
z vs w	z > w

Delete column.

deletions = 1

Process column 1.

Pair	Comparison
y vs v	y > v

Delete column.

deletions = 2

Process column 2.

Pair	Comparison
x vs u	x > u

Delete column.

deletions = 3

Final answer:

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n × m)	Each column scans adjacent row pairs
Space	O(n)	The `sorted_pairs` array stores pair states

The algorithm processes every column once. For each column, it may scan all adjacent row pairs twice:

Once to detect violations
Once to update resolved pairs

Since there are m columns and at most n - 1 adjacent pairs, the total complexity is:

O(n × m)

The extra memory usage comes from the boolean tracking array of size n - 1.

Test Cases

from typing import List

class Solution:
    def minDeletionSize(self, strs: List[str]) -> int:
        rows = len(strs)
        cols = len(strs[0])

        deletions = 0
        sorted_pairs = [False] * (rows - 1)

        for col in range(cols):

            should_delete = False

            for row in range(rows - 1):
                if not sorted_pairs[row]:
                    if strs[row][col] > strs[row + 1][col]:
                        should_delete = True
                        break

            if should_delete:
                deletions += 1
                continue

            for row in range(rows - 1):
                if not sorted_pairs[row]:
                    if strs[row][col] < strs[row + 1][col]:
                        sorted_pairs[row] = True

        return deletions

solution = Solution()

assert solution.minDeletionSize(["ca", "bb", "ac"]) == 1  # provided example
assert solution.minDeletionSize(["xc", "yb", "za"]) == 0  # already sorted
assert solution.minDeletionSize(["zyx", "wvu", "tsr"]) == 3  # delete all columns

assert solution.minDeletionSize(["a"]) == 0  # single string
assert solution.minDeletionSize(["abc", "abc"]) == 0  # duplicate strings
assert solution.minDeletionSize(["ba", "aa"]) == 1  # first column invalid
assert solution.minDeletionSize(["abx", "aby"]) == 0  # resolved by last column
assert solution.minDeletionSize(["xga", "xfb", "yfa"]) == 1  # mixed valid/invalid
assert solution.minDeletionSize(["aaa", "aaa", "aaa"]) == 0  # all identical
assert solution.minDeletionSize(["edcba"]) == 0  # single row always sorted
assert solution.minDeletionSize(["babca", "bbazb"]) == 1  # classic tricky case

Test Summary

Test	Why
`["ca","bb","ac"]`	Basic deletion example
`["xc","yb","za"]`	Already lexicographically sorted
`["zyx","wvu","tsr"]`	Every column must be deleted
`["a"]`	Smallest possible input
`["abc","abc"]`	Equal strings are allowed
`["ba","aa"]`	Immediate ordering violation
`["abx","aby"]`	Late character determines order
`["xga","xfb","yfa"]`	Combination of resolved and unresolved pairs
`["aaa","aaa","aaa"]`	Entire array identical
`["edcba"]`	Single row edge case
`["babca","bbazb"]`	Nontrivial greedy behavior

Edge Cases

One important edge case is when there is only one string in the input. A single string is always lexicographically sorted because there are no adjacent pairs to compare. A buggy implementation might incorrectly attempt pair comparisons and produce index errors. This implementation handles the case naturally because the adjacent pair loops execute zero times.

Another important case occurs when many strings share long equal prefixes. In these situations, the algorithm must avoid prematurely marking pairs as resolved. A pair only becomes resolved when a strictly smaller character is found. Equal characters mean future columns still matter. The sorted_pairs array correctly preserves unresolved pairs until a decisive comparison appears.

A third tricky case is when a column is valid for most pairs but invalid for one unresolved pair. The entire column must still be deleted because deletions apply globally across all strings. The implementation correctly checks every unresolved adjacent pair before deciding to keep a column. Even a single violation forces deletion.

A final subtle case occurs when some adjacent pairs are already resolved. Future columns may contain descending characters for those pairs, but those violations no longer matter because lexicographic order was already determined earlier. The implementation avoids false deletions by skipping comparisons for resolved pairs.