LeetCode 3138 - Minimum Length of Anagram Concatenation

The problem gives us a string s that was formed by concatenating several strings together, where every piece is an anagram of the same unknown string t. Our task is to determine the minimum possible length of t.

LeetCode Problem 3138

Difficulty: 🟡 Medium
Topics: Hash Table, String, Counting

Solution

Problem Understanding

The problem gives us a string s that was formed by concatenating several strings together, where every piece is an anagram of the same unknown string t.

Our task is to determine the minimum possible length of t.

An anagram means the characters are the same, only the order may differ. For example:

  • "abc"
  • "bca"
  • "cab"

are all anagrams of each other because they contain the same character frequencies.

Suppose:

s = "abba"

One valid decomposition is:

"ab" + "ba"

Both "ab" and "ba" are anagrams of "ab", so t could have length 2.

The key detail is that every chunk must contain exactly the same character counts. The order inside each chunk does not matter.

The constraints are important:

  • 1 <= s.length <= 10^5
  • Only lowercase English letters are used

A string length of 10^5 means we cannot afford very expensive nested operations over all substrings. We need something close to linear or mildly superlinear time.

Several edge cases are especially important:

  • A string where no smaller partition works, such as "cdef", where the answer is the entire length.
  • A string where every character is identical, such as "aaaaaa", where the answer is 1.
  • Cases where many divisors exist, but only some produce matching frequency distributions.
  • Very long strings, where recomputing character frequencies repeatedly could become too slow.

The problem guarantees that the entire string is a concatenation of anagrams of some string, at minimum the string itself. Therefore, a valid answer always exists.

Approaches

Brute Force Approach

A straightforward solution is to try every possible length k from 1 to n, where n is the length of s.

For each candidate length:

  1. Check whether k divides n.
  2. Split the string into blocks of size k.
  3. Compute the frequency count of each block.
  4. Verify whether all blocks have identical character frequencies.

If all blocks match, then k is a valid answer. Since we want the minimum length, we return the first valid one.

This works because every valid decomposition must partition the string into equal sized pieces whose character counts are identical.

However, this method becomes inefficient if implemented naively. Recomputing frequency counts from scratch for every substring can lead to large overhead.

Key Insight

If two substrings are anagrams, then their character frequency vectors must be identical.

Since there are only 26 lowercase English letters, we can represent each substring using a fixed size frequency array of length 26.

Instead of sorting substrings or using expensive comparisons, we simply compare frequency arrays.

The optimal strategy is:

  • Enumerate only divisors of n
  • For each divisor k, verify whether every block of size k has the same frequency distribution

Because frequency arrays are fixed size, comparisons are efficient.

Approach Comparison

Approach Time Complexity Space Complexity Notes
Brute Force O(n^2) O(1) Recomputes frequencies repeatedly for many candidates
Optimal O(n * d) O(1) Checks only divisors, frequency arrays are fixed size

Here, d represents the number of divisors of n, which is usually small.

Algorithm Walkthrough

Optimal Algorithm

  1. Let n be the length of the string.

We are looking for the smallest valid substring length that can partition the entire string evenly. 2. Iterate through every possible length k from 1 to n.

We only consider values where n % k == 0, because the string must split evenly into blocks of size k. 3. For the first block s[0:k], compute its character frequency array.

We use an array of size 26 because the input contains only lowercase English letters. 4. Process every remaining block of size k.

For each block:

  • Build its frequency array
  • Compare it with the first block's frequency array
  1. If every block matches, then all blocks are anagrams of each other.

Since we iterate lengths in increasing order, the first valid k is the minimum answer. 6. Return k.

Why it works

The algorithm relies on the defining property of anagrams:

Two strings are anagrams if and only if they contain the same frequency of every character.

By verifying that every block has the same character frequency vector, we guarantee that every block is an anagram of the others. Since we test lengths in ascending order, the first valid length is the minimum possible answer.

Python Solution

class Solution:
    def minAnagramLength(self, s: str) -> int:
        n = len(s)

        def get_frequency(substring: str) -> list[int]:
            freq = [0] * 26

            for ch in substring:
                freq[ord(ch) - ord('a')] += 1

            return freq

        for length in range(1, n + 1):
            if n % length != 0:
                continue

            target_freq = get_frequency(s[:length])

            valid = True

            for start in range(length, n, length):
                current_freq = get_frequency(s[start:start + length])

                if current_freq != target_freq:
                    valid = False
                    break

            if valid:
                return length

        return n

Implementation Explanation

The implementation begins by storing the string length in n.

The helper function get_frequency() constructs a frequency array for a substring. Each index corresponds to one lowercase letter:

  • index 0 represents 'a'
  • index 1 represents 'b'
  • and so on

The outer loop checks every possible candidate length.

We immediately skip lengths that cannot evenly divide the string because such partitions are impossible.

For each valid candidate length:

  • We compute the frequency array of the first block.
  • Then we compare every remaining block against it.

If any block differs, the candidate length fails.

If all blocks match, we immediately return the current length because it is the smallest valid one encountered so far.

The final return n is technically guaranteed to work because the entire string is always a valid decomposition.

Go Solution

func minAnagramLength(s string) int {
	n := len(s)

	getFrequency := func(sub string) [26]int {
		var freq [26]int

		for _, ch := range sub {
			freq[ch-'a']++
		}

		return freq
	}

	for length := 1; length <= n; length++ {
		if n%length != 0 {
			continue
		}

		targetFreq := getFrequency(s[:length])

		valid := true

		for start := length; start < n; start += length {
			currentFreq := getFrequency(s[start : start+length])

			if currentFreq != targetFreq {
				valid = false
				break
			}
		}

		if valid {
			return length
		}
	}

	return n
}

Go Specific Notes

The Go version uses a fixed size array:

[26]int

instead of slices. Arrays in Go support direct equality comparison, which makes frequency checks concise and efficient.

Unlike Python lists, Go arrays are value types, so comparisons check every element automatically.

No special overflow handling is needed because character counts never exceed 10^5, which easily fits inside Go's int type.

Worked Examples

Example 1

s = "abba"

Length of string:

n = 4

Possible divisors:

1, 2, 4

Try length = 1

Blocks:

Block Frequency
"a" a:1
"b" b:1

Frequencies differ, invalid.

Try length = 2

Blocks:

Block Frequency
"ab" a:1, b:1
"ba" a:1, b:1

All frequencies match.

Answer:

2

Example 2

s = "cdef"

Possible divisors:

1, 2, 4

Try length = 1

Blocks differ immediately.

Try length = 2

Block Frequency
"cd" c:1, d:1
"ef" e:1, f:1

Not equal.

Try length = 4

Only one block exists.

A single block is always valid.

Answer:

4

Example 3

s = "abcbcacabbaccba"

Length:

n = 15

Possible divisors:

1, 3, 5, 15

Try length = 1

Fails immediately.

Try length = 3

Blocks:

Block Sorted Form
"abc" abc
"bca" abc
"cab" abc
"bac" abc
"cba" abc

All are anagrams.

Answer:

3

Complexity Analysis

Measure Complexity Explanation
Time O(n * d) We test each divisor and scan the string
Space O(1) Frequency arrays are fixed size

The space complexity is constant because the frequency arrays always contain exactly 26 integers, regardless of input size.

The number of divisors of an integer is relatively small, so the algorithm performs efficiently even for strings of length 10^5.

Test Cases

solution = Solution()

assert solution.minAnagramLength("abba") == 2  # basic valid split
assert solution.minAnagramLength("cdef") == 4  # no smaller decomposition
assert solution.minAnagramLength("abcbcacabbaccba") == 3  # multiple anagram groups

assert solution.minAnagramLength("a") == 1  # single character
assert solution.minAnagramLength("aaaaaa") == 1  # all identical characters
assert solution.minAnagramLength("abab") == 2  # repeated anagram pair
assert solution.minAnagramLength("abcabcabc") == 3  # repeated exact substring
assert solution.minAnagramLength("bcaacbbac") == 3  # reordered anagram groups
assert solution.minAnagramLength("zzzzzzzz") == 1  # repeated same character
assert solution.minAnagramLength("abcd") == 4  # prime length, no valid split
assert solution.minAnagramLength("aabb") == 4  # equal counts but invalid partitions
assert solution.minAnagramLength("abcabc") == 3  # two matching groups
assert solution.minAnagramLength("cabbacabc") == 3  # shuffled valid groups

Test Summary

Test Why
"abba" Standard valid decomposition
"cdef" No smaller valid answer
"abcbcacabbaccba" Multiple anagram partitions
"a" Smallest possible input
"aaaaaa" Uniform characters
"abab" Simple repeated structure
"abcabcabc" Multiple identical blocks
"bcaacbbac" Blocks reordered internally
"zzzzzzzz" Same character repeated
"abcd" Prime length with no split
"aabb" Prevents incorrect frequency assumptions
"abcabc" Exact repeated substring
"cabbacabc" Multiple valid anagram chunks

Edge Cases

Single Character String

A string of length 1 is the smallest possible input. The only possible answer is 1.

This case can expose bugs in implementations that assume multiple partitions exist. Our algorithm handles it naturally because the first candidate length checked is 1, which immediately succeeds.

All Characters Identical

Consider:

"aaaaaa"

Every single character is already an anagram of every other single character.

A buggy implementation might overcomplicate partitioning logic or fail to recognize that length 1 works. Our solution correctly verifies that every block has identical frequency arrays.

No Smaller Valid Partition

Consider:

"abcd"

No partition smaller than the full string produces matching character frequencies.

This case is important because some incorrect solutions only compare global character counts and mistakenly assume smaller partitions must exist. Our implementation explicitly validates every block independently, so it correctly returns 4.

Equal Global Frequencies but Invalid Local Partitions

Consider:

"aabb"

The overall counts are balanced, but no partition of size 1 or 2 produces matching anagram groups.

This is a subtle case that breaks incorrect greedy solutions. Our algorithm compares each partition directly, ensuring correctness.