LeetCode 3138 - Minimum Length of Anagram Concatenation
The problem gives us a string s that was formed by concatenating several strings together, where every piece is an anagram of the same unknown string t. Our task is to determine the minimum possible length of t.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Counting
Solution
Problem Understanding
The problem gives us a string s that was formed by concatenating several strings together, where every piece is an anagram of the same unknown string t.
Our task is to determine the minimum possible length of t.
An anagram means the characters are the same, only the order may differ. For example:
"abc""bca""cab"
are all anagrams of each other because they contain the same character frequencies.
Suppose:
s = "abba"
One valid decomposition is:
"ab" + "ba"
Both "ab" and "ba" are anagrams of "ab", so t could have length 2.
The key detail is that every chunk must contain exactly the same character counts. The order inside each chunk does not matter.
The constraints are important:
1 <= s.length <= 10^5- Only lowercase English letters are used
A string length of 10^5 means we cannot afford very expensive nested operations over all substrings. We need something close to linear or mildly superlinear time.
Several edge cases are especially important:
- A string where no smaller partition works, such as
"cdef", where the answer is the entire length. - A string where every character is identical, such as
"aaaaaa", where the answer is1. - Cases where many divisors exist, but only some produce matching frequency distributions.
- Very long strings, where recomputing character frequencies repeatedly could become too slow.
The problem guarantees that the entire string is a concatenation of anagrams of some string, at minimum the string itself. Therefore, a valid answer always exists.
Approaches
Brute Force Approach
A straightforward solution is to try every possible length k from 1 to n, where n is the length of s.
For each candidate length:
- Check whether
kdividesn. - Split the string into blocks of size
k. - Compute the frequency count of each block.
- Verify whether all blocks have identical character frequencies.
If all blocks match, then k is a valid answer. Since we want the minimum length, we return the first valid one.
This works because every valid decomposition must partition the string into equal sized pieces whose character counts are identical.
However, this method becomes inefficient if implemented naively. Recomputing frequency counts from scratch for every substring can lead to large overhead.
Key Insight
If two substrings are anagrams, then their character frequency vectors must be identical.
Since there are only 26 lowercase English letters, we can represent each substring using a fixed size frequency array of length 26.
Instead of sorting substrings or using expensive comparisons, we simply compare frequency arrays.
The optimal strategy is:
- Enumerate only divisors of
n - For each divisor
k, verify whether every block of sizekhas the same frequency distribution
Because frequency arrays are fixed size, comparisons are efficient.
Approach Comparison
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n^2) | O(1) | Recomputes frequencies repeatedly for many candidates |
| Optimal | O(n * d) | O(1) | Checks only divisors, frequency arrays are fixed size |
Here, d represents the number of divisors of n, which is usually small.
Algorithm Walkthrough
Optimal Algorithm
- Let
nbe the length of the string.
We are looking for the smallest valid substring length that can partition the entire string evenly.
2. Iterate through every possible length k from 1 to n.
We only consider values where n % k == 0, because the string must split evenly into blocks of size k.
3. For the first block s[0:k], compute its character frequency array.
We use an array of size 26 because the input contains only lowercase English letters.
4. Process every remaining block of size k.
For each block:
- Build its frequency array
- Compare it with the first block's frequency array
- If every block matches, then all blocks are anagrams of each other.
Since we iterate lengths in increasing order, the first valid k is the minimum answer.
6. Return k.
Why it works
The algorithm relies on the defining property of anagrams:
Two strings are anagrams if and only if they contain the same frequency of every character.
By verifying that every block has the same character frequency vector, we guarantee that every block is an anagram of the others. Since we test lengths in ascending order, the first valid length is the minimum possible answer.
Python Solution
class Solution:
def minAnagramLength(self, s: str) -> int:
n = len(s)
def get_frequency(substring: str) -> list[int]:
freq = [0] * 26
for ch in substring:
freq[ord(ch) - ord('a')] += 1
return freq
for length in range(1, n + 1):
if n % length != 0:
continue
target_freq = get_frequency(s[:length])
valid = True
for start in range(length, n, length):
current_freq = get_frequency(s[start:start + length])
if current_freq != target_freq:
valid = False
break
if valid:
return length
return n
Implementation Explanation
The implementation begins by storing the string length in n.
The helper function get_frequency() constructs a frequency array for a substring. Each index corresponds to one lowercase letter:
- index
0represents'a' - index
1represents'b' - and so on
The outer loop checks every possible candidate length.
We immediately skip lengths that cannot evenly divide the string because such partitions are impossible.
For each valid candidate length:
- We compute the frequency array of the first block.
- Then we compare every remaining block against it.
If any block differs, the candidate length fails.
If all blocks match, we immediately return the current length because it is the smallest valid one encountered so far.
The final return n is technically guaranteed to work because the entire string is always a valid decomposition.
Go Solution
func minAnagramLength(s string) int {
n := len(s)
getFrequency := func(sub string) [26]int {
var freq [26]int
for _, ch := range sub {
freq[ch-'a']++
}
return freq
}
for length := 1; length <= n; length++ {
if n%length != 0 {
continue
}
targetFreq := getFrequency(s[:length])
valid := true
for start := length; start < n; start += length {
currentFreq := getFrequency(s[start : start+length])
if currentFreq != targetFreq {
valid = false
break
}
}
if valid {
return length
}
}
return n
}
Go Specific Notes
The Go version uses a fixed size array:
[26]int
instead of slices. Arrays in Go support direct equality comparison, which makes frequency checks concise and efficient.
Unlike Python lists, Go arrays are value types, so comparisons check every element automatically.
No special overflow handling is needed because character counts never exceed 10^5, which easily fits inside Go's int type.
Worked Examples
Example 1
s = "abba"
Length of string:
n = 4
Possible divisors:
1, 2, 4
Try length = 1
Blocks:
| Block | Frequency |
|---|---|
| "a" | a:1 |
| "b" | b:1 |
Frequencies differ, invalid.
Try length = 2
Blocks:
| Block | Frequency |
|---|---|
| "ab" | a:1, b:1 |
| "ba" | a:1, b:1 |
All frequencies match.
Answer:
2
Example 2
s = "cdef"
Possible divisors:
1, 2, 4
Try length = 1
Blocks differ immediately.
Try length = 2
| Block | Frequency |
|---|---|
| "cd" | c:1, d:1 |
| "ef" | e:1, f:1 |
Not equal.
Try length = 4
Only one block exists.
A single block is always valid.
Answer:
4
Example 3
s = "abcbcacabbaccba"
Length:
n = 15
Possible divisors:
1, 3, 5, 15
Try length = 1
Fails immediately.
Try length = 3
Blocks:
| Block | Sorted Form |
|---|---|
| "abc" | abc |
| "bca" | abc |
| "cab" | abc |
| "bac" | abc |
| "cba" | abc |
All are anagrams.
Answer:
3
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * d) | We test each divisor and scan the string |
| Space | O(1) | Frequency arrays are fixed size |
The space complexity is constant because the frequency arrays always contain exactly 26 integers, regardless of input size.
The number of divisors of an integer is relatively small, so the algorithm performs efficiently even for strings of length 10^5.
Test Cases
solution = Solution()
assert solution.minAnagramLength("abba") == 2 # basic valid split
assert solution.minAnagramLength("cdef") == 4 # no smaller decomposition
assert solution.minAnagramLength("abcbcacabbaccba") == 3 # multiple anagram groups
assert solution.minAnagramLength("a") == 1 # single character
assert solution.minAnagramLength("aaaaaa") == 1 # all identical characters
assert solution.minAnagramLength("abab") == 2 # repeated anagram pair
assert solution.minAnagramLength("abcabcabc") == 3 # repeated exact substring
assert solution.minAnagramLength("bcaacbbac") == 3 # reordered anagram groups
assert solution.minAnagramLength("zzzzzzzz") == 1 # repeated same character
assert solution.minAnagramLength("abcd") == 4 # prime length, no valid split
assert solution.minAnagramLength("aabb") == 4 # equal counts but invalid partitions
assert solution.minAnagramLength("abcabc") == 3 # two matching groups
assert solution.minAnagramLength("cabbacabc") == 3 # shuffled valid groups
Test Summary
| Test | Why |
|---|---|
"abba" |
Standard valid decomposition |
"cdef" |
No smaller valid answer |
"abcbcacabbaccba" |
Multiple anagram partitions |
"a" |
Smallest possible input |
"aaaaaa" |
Uniform characters |
"abab" |
Simple repeated structure |
"abcabcabc" |
Multiple identical blocks |
"bcaacbbac" |
Blocks reordered internally |
"zzzzzzzz" |
Same character repeated |
"abcd" |
Prime length with no split |
"aabb" |
Prevents incorrect frequency assumptions |
"abcabc" |
Exact repeated substring |
"cabbacabc" |
Multiple valid anagram chunks |
Edge Cases
Single Character String
A string of length 1 is the smallest possible input. The only possible answer is 1.
This case can expose bugs in implementations that assume multiple partitions exist. Our algorithm handles it naturally because the first candidate length checked is 1, which immediately succeeds.
All Characters Identical
Consider:
"aaaaaa"
Every single character is already an anagram of every other single character.
A buggy implementation might overcomplicate partitioning logic or fail to recognize that length 1 works. Our solution correctly verifies that every block has identical frequency arrays.
No Smaller Valid Partition
Consider:
"abcd"
No partition smaller than the full string produces matching character frequencies.
This case is important because some incorrect solutions only compare global character counts and mistakenly assume smaller partitions must exist. Our implementation explicitly validates every block independently, so it correctly returns 4.
Equal Global Frequencies but Invalid Local Partitions
Consider:
"aabb"
The overall counts are balanced, but no partition of size 1 or 2 produces matching anagram groups.
This is a subtle case that breaks incorrect greedy solutions. Our algorithm compares each partition directly, ensuring correctness.