LeetCode 2186 - Minimum Number of Steps to Make Two Strings Anagram II

Difficulty: 🟡 Medium
Topics: Hash Table, String, Counting

Solution

Problem Understanding

The problem asks us to find the minimum number of steps required to make two strings, s and t, anagrams of each other. An anagram means that both strings must contain the exact same characters with the same frequency, though their order can differ. In this problem, a "step" allows appending any character to either string. Effectively, we can only add characters, not remove them, so the challenge is to balance the character counts between the two strings efficiently.

The input consists of two non-empty lowercase English strings, each up to 200,000 characters. The output is a single integer representing the minimal number of steps required. The constraints imply that a naive algorithm that examines every permutation or combination of characters is infeasible, and we need an approach that scales linearly with string length.

Important edge cases include strings that are already anagrams, completely disjoint strings with no characters in common, and very short strings (length 1), which could stress counting logic. The problem guarantees non-empty strings, so we do not need to handle empty inputs.

Approaches

A brute-force approach would be to try all possible ways of adding characters to match the two strings. For each character, we could attempt to balance counts by repeatedly appending to the string with fewer occurrences. This approach is conceptually correct, but it is far too slow because it would require simulating all possible additions, leading to exponential time complexity.

The key observation for an optimal solution is that we do not need to simulate appending characters. Instead, we can count the frequency of each character in both strings and compute the difference. For any character that appears more times in one string than the other, the difference must be added to the other string to balance it. Summing these differences across all characters gives the minimum number of steps.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(?), exponential	O(1)	Attempting all append combinations, infeasible for large strings
Optimal	O(n + m)	O(26)	Count character frequencies and compute differences

Algorithm Walkthrough

Initialize two frequency arrays of size 26, one for each string. Each index corresponds to a lowercase letter ('a' = 0, 'z' = 25).
Iterate over string s and increment the count for each character in s's frequency array.
Iterate over string t and increment the count for each character in t's frequency array.
Initialize a variable steps to 0. This will track the total number of steps required.
Loop through all 26 letters. For each letter, compute the absolute difference between s and t frequencies. Add this difference to steps.
Return steps as the minimum number of steps needed.

Why it works: The frequency arrays represent the exact counts of each character. The absolute difference directly gives the number of characters that need to be appended to balance that character between the two strings. Summing over all characters guarantees the minimal steps because each extra character must be added individually and no step is wasted.

Python Solution

class Solution:
    def minSteps(self, s: str, t: str) -> int:
        freq_s = [0] * 26
        freq_t = [0] * 26

        for char in s:
            freq_s[ord(char) - ord('a')] += 1
        for char in t:
            freq_t[ord(char) - ord('a')] += 1

        steps = 0
        for i in range(26):
            steps += abs(freq_s[i] - freq_t[i])

        return steps

In this implementation, freq_s and freq_t count the occurrences of each letter in s and t. The loop over 26 letters calculates the total difference in counts, which corresponds to the minimal number of append operations needed. Using ord(char) - ord('a') maps each character to the correct index efficiently.

Go Solution

func minSteps(s string, t string) int {
    freqS := [26]int{}
    freqT := [26]int{}

    for i := 0; i < len(s); i++ {
        freqS[s[i]-'a']++
    }
    for i := 0; i < len(t); i++ {
        freqT[t[i]-'a']++
    }

    steps := 0
    for i := 0; i < 26; i++ {
        if freqS[i] > freqT[i] {
            steps += freqS[i] - freqT[i]
        } else {
            steps += freqT[i] - freqS[i]
        }
    }

    return steps
}

The Go version uses fixed-size arrays instead of slices for efficiency. We iterate over string bytes instead of runes since the input is guaranteed to be lowercase English letters. Calculating absolute differences is done manually because Go does not have a built-in abs function for integers.

Worked Examples

Example 1: s = "leetcode", t = "coats"

Character	Count in s	Count in t	Difference
a	0	1	1
c	1	1	0
d	1	0	1
e	3	0	3
l	1	0	1
o	1	1	0
s	0	1	1
t	0	1	1

Sum of differences = 7 → minimum steps = 7.

Example 2: s = "night", t = "thing"

Both strings have exactly the same counts for each character. Sum of differences = 0 → minimum steps = 0.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(n + m)	Counting characters in `s` and `t` requires O(n + m), and iterating 26 letters is constant.
Space	O(1)	Frequency arrays of size 26 are constant space regardless of input size.

The complexity is efficient and scales linearly with the length of the input strings.

Test Cases

# Provided examples
assert Solution().minSteps("leetcode", "coats") == 7  # Example 1
assert Solution().minSteps("night", "thing") == 0     # Example 2

# Edge cases
assert Solution().minSteps("a", "a") == 0              # identical single characters
assert Solution().minSteps("a", "b") == 2             # completely different single characters
assert Solution().minSteps("abc", "def") == 6         # no overlap
assert Solution().minSteps("aaaa", "aa") == 2         # different counts
assert Solution().minSteps("abcde"*40000, "edcba"*40000) == 0  # large inputs, already anagrams

Test	Why
"leetcode", "coats"	Standard example with partial overlap
"night", "thing"	Already anagrams, steps = 0
"a", "a"	Single-character edge case, identical
"a", "b"	Single-character edge case, completely different
"abc", "def"	No overlapping characters, maximum difference
"aaaa", "aa"	Different counts of the same character
Large repeated strings	Tests performance on maximum input size

Edge Cases

One edge case is when both strings are already anagrams. The algorithm handles this naturally because the absolute differences of counts will all be zero, resulting in zero steps.

A second edge case is when the strings share no common characters. The sum of differences will equal the total length of both strings, ensuring the algorithm correctly calculates the maximum number of steps.

A third edge case is strings with very high repetition of certain letters, such as "aaaaaa" vs "a". The difference is captured correctly by the frequency arrays, and only the minimal number of append operations are counted to balance the counts. This avoids any off-by-one errors in counting repeated characters.