LeetCode 955 - Delete Columns to Make Sorted II
This problem asks us to remove the minimum number of columns from a list of equal length strings so that the resulting array of strings becomes lexicographically sorted.
Difficulty: 🟡 Medium
Topics: Array, String, Greedy
Solution
Problem Understanding
This problem asks us to remove the minimum number of columns from a list of equal length strings so that the resulting array of strings becomes lexicographically sorted.
The important detail is that when we delete a column, we delete that character position from every string simultaneously. We cannot selectively delete characters from individual strings.
Lexicographic order is the same ordering used in dictionaries. Given two strings a and b:
- If the first differing character in
ais smaller than the corresponding character inb, thena < b - If all compared characters are equal and one string ends first, the shorter string is smaller
- Equal strings are also allowed because the requirement is
<=, not strictly<
The input consists of:
nstrings- Every string has the same length
m 1 <= n <= 1001 <= m <= 100
The goal is to return the minimum number of deleted columns required so that:
strs[0] <= strs[1] <= strs[2] <= ... <= strs[n-1]
A key observation is that lexicographic comparison happens from left to right. Once two adjacent strings are already determined to be in correct order because of an earlier column, later columns no longer matter for that pair.
For example:
["abx", "acy"]
At column 1:
'b' < 'c'
The ordering between these two strings is already fixed. Whatever appears later cannot invalidate it.
This property is the foundation of the greedy solution.
Some important edge cases include:
- The strings may already be sorted, requiring
0deletions - Every column may need deletion
- Many rows may share equal prefixes
- A column may be valid for some adjacent pairs but invalid for others
- Duplicate strings are allowed
- A single bad comparison between adjacent rows forces deletion of that column
Because n and m are both at most 100, an O(n * m) or O(n * m^2) solution is completely acceptable.
Approaches
Brute Force Approach
A brute force solution would try every possible subset of columns to delete.
There are m columns, and each column can either be kept or deleted. That creates:
2^m
possible subsets.
For each subset, we would:
- Construct the transformed strings
- Check whether the resulting array is lexicographically sorted
- Track the minimum deletions among all valid subsets
This approach is guaranteed to find the optimal answer because it examines every possible deletion configuration.
However, the complexity becomes exponential.
If m = 100, then:
2^100
is astronomically large and completely infeasible.
We need a more intelligent strategy.
Optimal Greedy Approach
The key insight is that lexicographic ordering is determined progressively from left to right.
When comparing adjacent rows:
strs[i]
strs[i+1]
there are three possibilities at a column:
-
strs[i][col] < strs[i+1][col] -
Their order becomes permanently resolved
-
strs[i][col] == strs[i+1][col] -
We still need future columns to determine ordering
-
strs[i][col] > strs[i+1][col] -
This column makes the ordering invalid and must be deleted
We process columns greedily from left to right.
We maintain which adjacent row pairs are already confirmed to be correctly ordered. Once a pair is confirmed, we never need to compare that pair again.
For each column:
- If it violates ordering for any unresolved pair, we delete it
- Otherwise, we keep it and update newly resolved pairs
This works because deleting a column later cannot repair an ordering violation caused by an earlier kept column.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(2^m × n × m) | O(n × m) | Tries every subset of deleted columns |
| Optimal Greedy | O(n × m) | O(n) | Processes columns left to right while tracking resolved pairs |
Algorithm Walkthrough
Step 1: Initialize tracking structures
We create:
deletionsto count removed columns- A boolean array
sorted_pairs
sorted_pairs[i] means:
strs[i] < strs[i+1]
has already been permanently established by an earlier kept column.
Initially, every pair is unresolved.
Step 2: Process columns from left to right
For each column:
- Check whether keeping this column would violate ordering for any unresolved adjacent pair
- If even one unresolved pair has:
strs[i][col] > strs[i+1][col]
then this column must be deleted
Why?
Because lexicographic comparison uses the first differing character. If we keep a bad column, the final ordering becomes impossible to fix later.
Step 3: Delete invalid columns
If the column is invalid:
- Increment
deletions - Skip updating any pair states
- Move to the next column
We pretend this column never existed.
Step 4: Update resolved pairs for valid columns
If the column is safe to keep:
For every unresolved adjacent pair:
strs[i][col] < strs[i+1][col]
we mark that pair as resolved.
Future columns no longer matter for that pair.
Step 5: Continue until all columns are processed
After processing all columns, return the total deletions.
Why it works
The greedy strategy is correct because lexicographic order depends on the first differing character.
If a kept column creates:
strs[i][col] > strs[i+1][col]
for an unresolved pair, no future column can repair that violation. Therefore, the column must be deleted immediately.
Conversely, if a column safely maintains ordering, keeping it can only help because it may permanently resolve additional row pairs.
By greedily deleting only necessary columns, we achieve the minimum possible deletions.
Python Solution
from typing import List
class Solution:
def minDeletionSize(self, strs: List[str]) -> int:
rows = len(strs)
cols = len(strs[0])
deletions = 0
# sorted_pairs[i] means:
# strs[i] < strs[i + 1] is already guaranteed
sorted_pairs = [False] * (rows - 1)
for col in range(cols):
# Check whether this column causes an invalid ordering
should_delete = False
for row in range(rows - 1):
if not sorted_pairs[row]:
if strs[row][col] > strs[row + 1][col]:
should_delete = True
break
# Delete the column if invalid
if should_delete:
deletions += 1
continue
# Otherwise, update resolved pairs
for row in range(rows - 1):
if not sorted_pairs[row]:
if strs[row][col] < strs[row + 1][col]:
sorted_pairs[row] = True
return deletions
The implementation begins by determining the number of rows and columns.
The sorted_pairs array is the core optimization. Instead of repeatedly comparing fully constructed strings, we track which adjacent pairs already have confirmed ordering.
For every column, we first determine whether the column is safe to keep. We only compare unresolved pairs because resolved pairs no longer matter.
If we detect an ordering violation, we increment the deletion count and skip the column entirely.
If the column is valid, we use it to resolve additional pairs. Whenever:
strs[row][col] < strs[row + 1][col]
the relative order between those two rows becomes permanently fixed.
At the end, the deletion count contains the minimum number of removed columns.
Go Solution
func minDeletionSize(strs []string) int {
rows := len(strs)
cols := len(strs[0])
deletions := 0
// sortedPairs[i] means:
// strs[i] < strs[i+1] is already guaranteed
sortedPairs := make([]bool, rows-1)
for col := 0; col < cols; col++ {
shouldDelete := false
// Check whether this column breaks ordering
for row := 0; row < rows-1; row++ {
if !sortedPairs[row] {
if strs[row][col] > strs[row+1][col] {
shouldDelete = true
break
}
}
}
// Delete invalid column
if shouldDelete {
deletions++
continue
}
// Update resolved pairs
for row := 0; row < rows-1; row++ {
if !sortedPairs[row] {
if strs[row][col] < strs[row+1][col] {
sortedPairs[row] = true
}
}
}
}
return deletions
}
The Go implementation follows the same logic as the Python version.
Go strings support byte indexing directly because the input only contains lowercase English letters. That allows character comparisons like:
strs[row][col]
without additional rune conversion.
The sortedPairs slice stores the resolved adjacent row relationships exactly as in Python.
No special handling for integer overflow is necessary because all values are very small relative to Go's integer limits.
Worked Examples
Example 1
Input: ["ca","bb","ac"]
Initial state:
| Variable | Value |
|---|---|
| deletions | 0 |
| sorted_pairs | [False, False] |
We process column 0.
Characters:
| Pair | Comparison |
|---|---|
| "c" vs "b" | c > b |
| "b" vs "a" | b > a |
The first comparison already violates ordering.
Delete column 0.
| Variable | Value |
|---|---|
| deletions | 1 |
| sorted_pairs | [False, False] |
Now process column 1.
Characters:
| Pair | Comparison |
|---|---|
| "a" vs "b" | a < b |
| "b" vs "c" | b < c |
No violations occur.
Mark both pairs resolved.
| Variable | Value |
|---|---|
| deletions | 1 |
| sorted_pairs | [True, True] |
Final answer:
1
Example 2
Input: ["xc","yb","za"]
Initial state:
| Variable | Value |
|---|---|
| deletions | 0 |
| sorted_pairs | [False, False] |
Process column 0.
| Pair | Comparison |
|---|---|
| "x" vs "y" | x < y |
| "y" vs "z" | y < z |
Both pairs become resolved.
| Variable | Value |
|---|---|
| deletions | 0 |
| sorted_pairs | [True, True] |
Process column 1.
All pairs are already resolved, so this column cannot create problems.
Final answer:
0
Example 3
Input: ["zyx","wvu","tsr"]
Initial state:
| Variable | Value |
|---|---|
| deletions | 0 |
| sorted_pairs | [False, False] |
Process column 0.
| Pair | Comparison |
|---|---|
| z vs w | z > w |
Delete column.
deletions = 1
Process column 1.
| Pair | Comparison |
|---|---|
| y vs v | y > v |
Delete column.
deletions = 2
Process column 2.
| Pair | Comparison |
|---|---|
| x vs u | x > u |
Delete column.
deletions = 3
Final answer:
3
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n × m) | Each column scans adjacent row pairs |
| Space | O(n) | The sorted_pairs array stores pair states |
The algorithm processes every column once. For each column, it may scan all adjacent row pairs twice:
- Once to detect violations
- Once to update resolved pairs
Since there are m columns and at most n - 1 adjacent pairs, the total complexity is:
O(n × m)
The extra memory usage comes from the boolean tracking array of size n - 1.
Test Cases
from typing import List
class Solution:
def minDeletionSize(self, strs: List[str]) -> int:
rows = len(strs)
cols = len(strs[0])
deletions = 0
sorted_pairs = [False] * (rows - 1)
for col in range(cols):
should_delete = False
for row in range(rows - 1):
if not sorted_pairs[row]:
if strs[row][col] > strs[row + 1][col]:
should_delete = True
break
if should_delete:
deletions += 1
continue
for row in range(rows - 1):
if not sorted_pairs[row]:
if strs[row][col] < strs[row + 1][col]:
sorted_pairs[row] = True
return deletions
solution = Solution()
assert solution.minDeletionSize(["ca", "bb", "ac"]) == 1 # provided example
assert solution.minDeletionSize(["xc", "yb", "za"]) == 0 # already sorted
assert solution.minDeletionSize(["zyx", "wvu", "tsr"]) == 3 # delete all columns
assert solution.minDeletionSize(["a"]) == 0 # single string
assert solution.minDeletionSize(["abc", "abc"]) == 0 # duplicate strings
assert solution.minDeletionSize(["ba", "aa"]) == 1 # first column invalid
assert solution.minDeletionSize(["abx", "aby"]) == 0 # resolved by last column
assert solution.minDeletionSize(["xga", "xfb", "yfa"]) == 1 # mixed valid/invalid
assert solution.minDeletionSize(["aaa", "aaa", "aaa"]) == 0 # all identical
assert solution.minDeletionSize(["edcba"]) == 0 # single row always sorted
assert solution.minDeletionSize(["babca", "bbazb"]) == 1 # classic tricky case
Test Summary
| Test | Why |
|---|---|
["ca","bb","ac"] |
Basic deletion example |
["xc","yb","za"] |
Already lexicographically sorted |
["zyx","wvu","tsr"] |
Every column must be deleted |
["a"] |
Smallest possible input |
["abc","abc"] |
Equal strings are allowed |
["ba","aa"] |
Immediate ordering violation |
["abx","aby"] |
Late character determines order |
["xga","xfb","yfa"] |
Combination of resolved and unresolved pairs |
["aaa","aaa","aaa"] |
Entire array identical |
["edcba"] |
Single row edge case |
["babca","bbazb"] |
Nontrivial greedy behavior |
Edge Cases
One important edge case is when there is only one string in the input. A single string is always lexicographically sorted because there are no adjacent pairs to compare. A buggy implementation might incorrectly attempt pair comparisons and produce index errors. This implementation handles the case naturally because the adjacent pair loops execute zero times.
Another important case occurs when many strings share long equal prefixes. In these situations, the algorithm must avoid prematurely marking pairs as resolved. A pair only becomes resolved when a strictly smaller character is found. Equal characters mean future columns still matter. The sorted_pairs array correctly preserves unresolved pairs until a decisive comparison appears.
A third tricky case is when a column is valid for most pairs but invalid for one unresolved pair. The entire column must still be deleted because deletions apply globally across all strings. The implementation correctly checks every unresolved adjacent pair before deciding to keep a column. Even a single violation forces deletion.
A final subtle case occurs when some adjacent pairs are already resolved. Future columns may contain descending characters for those pairs, but those violations no longer matter because lexicographic order was already determined earlier. The implementation avoids false deletions by skipping comparisons for resolved pairs.