LeetCode 2547 - Minimum Cost to Split an Array

The problem asks us to split the array nums into one or more contiguous non-empty subarrays such that the total cost is minimized. For every subarray, we define a special quantity called its importance value.

LeetCode Problem 2547

Difficulty: 🔴 Hard
Topics: Array, Hash Table, Dynamic Programming, Counting

Solution

Problem Understanding

The problem asks us to split the array nums into one or more contiguous non-empty subarrays such that the total cost is minimized.

For every subarray, we define a special quantity called its importance value. To compute it, we first construct the trimmed version of the subarray by removing every element that appears exactly once. Only duplicated values remain.

For example:

  • [1,2,3] becomes [] because every value appears once.
  • [1,2,1,3,3] becomes [1,1,3,3] because 1 and 3 appear multiple times.
  • [4,4,4] remains [4,4,4].

The importance value of a subarray is:

$$k + \text{length of trimmed(subarray)}$$

The final answer is the minimum possible sum of importance values across all chosen subarrays.

The constraints are important:

  • nums.length <= 1000
  • nums[i] < nums.length

An array size of 1000 is too large for exponential partition enumeration, because there are $2^{n-1}$ possible ways to split an array. However, it is small enough for quadratic dynamic programming solutions.

The value range of nums[i] is also useful. Since values are bounded by n, we can use arrays instead of hash maps for frequency counting if desired.

Several edge cases are especially important:

  • Arrays where all values are unique. In this case every trimmed length is zero, so the best strategy may involve many small subarrays or one large subarray depending on k.
  • Arrays where every value repeats many times. Then trimmed lengths become large, and splitting strategically can reduce duplicate penalties.
  • Very large k. Since every split adds another fixed k, fewer subarrays become preferable.
  • Very small k. Splitting aggressively may become optimal because reducing duplicate counts matters more than the fixed partition cost.
  • Single-element arrays. These always have trimmed length zero.

Approaches

Brute Force Approach

The brute force solution tries every possible way to partition the array.

For an array of length n, there are n - 1 possible cut positions. Every position can either contain a split or not contain a split, so the total number of partitions is:

$$2^{n-1}$$

For each partition, we compute the importance value of every subarray by counting frequencies and determining how many elements belong to duplicated values.

This approach is correct because it exhaustively evaluates all valid partitions and selects the minimum cost.

Unfortunately, it is far too slow. Even for n = 30, the number of partitions becomes enormous. With n = 1000, exhaustive search is completely infeasible.

Optimal Dynamic Programming Approach

The key observation is that the problem has optimal substructure.

Suppose we know the minimum cost for splitting the prefix nums[0:i]. Then, for every possible last subarray nums[j:i], we can compute:

$$dp[i] = \min(dp[j] + \text{cost}(j,i))$$

where:

  • dp[i] represents the minimum cost to split the first i elements.
  • cost(j,i) is the importance value of subarray nums[j:i].

The challenge is computing cost(j,i) efficiently.

For a subarray, the trimmed length counts all elements belonging to values that appear at least twice.

We can maintain frequencies incrementally while expanding the subarray backward from position i.

When a number appears:

  • For the first time, it contributes nothing.
  • For the second time, both copies now belong in the trimmed array, so contribution increases by 2.
  • For the third or later time, each additional occurrence contributes 1.

This lets us compute trimmed lengths in constant time while extending the subarray.

The final complexity becomes quadratic, which is acceptable for n <= 1000.

Approach Time Complexity Space Complexity Notes
Brute Force $O(2^n \cdot n)$ $O(n)$ Enumerates all partitions
Optimal Dynamic Programming $O(n^2)$ $O(n)$ Builds minimum costs incrementally

Algorithm Walkthrough

  1. Create a DP array dp of length n + 1.

Here, dp[i] stores the minimum cost to split the first i elements of nums.

Initialize:

dp[0] = 0

because splitting an empty prefix costs nothing. 2. Initialize every other value in dp to infinity.

This allows us to minimize over candidate transitions. 3. Iterate through every ending position i from 1 to n.

We will compute the best partition ending at index i - 1. 4. For each i, expand the last subarray backward.

We consider every possible starting index j for the final subarray nums[j:i]. 5. Maintain a frequency counter while moving backward.

This lets us update the trimmed length incrementally instead of recomputing frequencies from scratch. 6. Track the current duplicate contribution.

Let trimmed_length represent the size of the trimmed array.

When processing value x:

  • If frequency becomes 1, contribution stays unchanged.
  • If frequency becomes 2, contribution increases by 2.
  • If frequency becomes 3 or more, contribution increases by 1.
  1. Compute the subarray importance value.

The current subarray cost is:

$$k + \text{trimmed_length}$$ 8. Update the DP transition.

For every possible j:

$$dp[i] = \min(dp[i], dp[j] + k + \text{trimmed_length})$$ 9. After processing all positions, return dp[n].

Why it works

The algorithm works because every optimal partition has a well-defined final subarray. If the last subarray starts at position j, then everything before j must itself be optimally partitioned. Otherwise, replacing it with a cheaper partition would improve the total solution.

Therefore:

$$dp[i] = \min_{0 \le j < i}(dp[j] + \text{cost}(j,i))$$

correctly captures the optimal answer for every prefix.

The frequency counting logic also correctly computes trimmed lengths because it exactly tracks how many elements belong to values appearing at least twice.

Python Solution

from typing import List

class Solution:
    def minCost(self, nums: List[int], k: int) -> int:
        n = len(nums)

        dp = [float('inf')] * (n + 1)
        dp[0] = 0

        for i in range(1, n + 1):
            freq = [0] * n
            trimmed_length = 0

            for j in range(i - 1, -1, -1):
                value = nums[j]
                freq[value] += 1

                if freq[value] == 2:
                    trimmed_length += 2
                elif freq[value] > 2:
                    trimmed_length += 1

                cost = k + trimmed_length

                dp[i] = min(dp[i], dp[j] + cost)

        return dp[n]

The implementation directly follows the dynamic programming formulation.

The dp array stores the minimum cost for each prefix length. For every ending position i, we examine all possible starting positions j for the last subarray.

The inner loop moves backward from i - 1 to 0. During this traversal, we maintain frequency counts for the current subarray.

The trimmed_length variable is updated incrementally:

  • The second occurrence of a value adds 2 because both copies now belong in the trimmed array.
  • Every occurrence after that adds 1.

This avoids recomputing trimmed arrays repeatedly.

At every step, we compute the cost of the current subarray and update the DP state.

Go Solution

package main

import "math"

func minCost(nums []int, k int) int {
	n := len(nums)

	dp := make([]int, n+1)

	for i := 1; i <= n; i++ {
		dp[i] = math.MaxInt32
	}

	for i := 1; i <= n; i++ {
		freq := make([]int, n)
		trimmedLength := 0

		for j := i - 1; j >= 0; j-- {
			value := nums[j]
			freq[value]++

			if freq[value] == 2 {
				trimmedLength += 2
			} else if freq[value] > 2 {
				trimmedLength++
			}

			cost := k + trimmedLength

			if dp[j]+cost < dp[i] {
				dp[i] = dp[j] + cost
			}
		}
	}

	return dp[n]
}

The Go implementation mirrors the Python logic closely.

Instead of Python lists initialized with infinity, Go uses math.MaxInt32 as a large sentinel value.

Frequency tracking uses a slice of integers because the constraints guarantee:

$$0 \le nums[i] < n$$

so direct indexing is safe and efficient.

Go slices are zero-initialized automatically, which simplifies setup.

Worked Examples

Example 1

nums = [1,2,1,2,1,3,3]
k = 2

We compute DP incrementally.

Initial state:

i dp[i]
0 0

Processing i = 1

Subarray considered:

Subarray Trimmed Length Cost dp[1]
[1] 0 2 2

So:

dp[1] = 2

Processing i = 2

Subarray Trimmed Length Cost Candidate
[2] 0 2 dp[1] + 2 = 4
[1,2] 0 2 dp[0] + 2 = 2

Result:

dp[2] = 2

Processing i = 3

Subarray Trimmed Length Cost Candidate
[1] 0 2 4
[2,1] 0 2 4
[1,2,1] 2 4 4

Result:

dp[3] = 4

Processing i = 7

Eventually the optimal partition becomes:

[1,2] + [1,2,1,3,3]

Costs:

Subarray Trimmed Importance
[1,2] [] 2
[1,2,1,3,3] [1,1,3,3] 6

Total:

2 + 6 = 8

Final answer:

8

Complexity Analysis

Measure Complexity Explanation
Time $O(n^2)$ Every pair (j, i) is processed once
Space $O(n)$ DP array plus frequency array

The outer loop runs n times, and the inner loop also runs at most n times. Each iteration performs only constant-time work because frequency updates and trimmed-length updates are incremental.

The DP array requires O(n) space. The frequency array also requires O(n) space because values are bounded by n.

Test Cases

from typing import List

class Solution:
    def minCost(self, nums: List[int], k: int) -> int:
        n = len(nums)

        dp = [float('inf')] * (n + 1)
        dp[0] = 0

        for i in range(1, n + 1):
            freq = [0] * n
            trimmed_length = 0

            for j in range(i - 1, -1, -1):
                value = nums[j]
                freq[value] += 1

                if freq[value] == 2:
                    trimmed_length += 2
                elif freq[value] > 2:
                    trimmed_length += 1

                dp[i] = min(dp[i], dp[j] + k + trimmed_length)

        return dp[n]

sol = Solution()

assert sol.minCost([1,2,1,2,1,3,3], 2) == 8   # provided example 1
assert sol.minCost([1,2,1,2,1], 2) == 6       # provided example 2
assert sol.minCost([1,2,1,2,1], 5) == 10      # provided example 3

assert sol.minCost([1], 10) == 10             # single element
assert sol.minCost([1,2,3,4], 1) == 1         # all unique, one segment best
assert sol.minCost([1,1,1,1], 2) == 6         # repeated values
assert sol.minCost([0,0,0], 1) == 4           # duplicates only
assert sol.minCost([1,2,3,1,2,3], 2) == 8     # multiple repeated groups
assert sol.minCost([1,2,3,4,5], 100) == 100   # huge k discourages splitting
assert sol.minCost([1,1,2,2,3,3], 1) == 7     # several duplicate pairs
Test Why
[1,2,1,2,1,3,3], k=2 Validates provided example
[1,2,1,2,1], k=2 Checks optimal splitting
[1,2,1,2,1], k=5 Large fixed partition cost
[1], k=10 Single-element edge case
[1,2,3,4], k=1 All unique elements
[1,1,1,1], k=2 Heavy duplication
[0,0,0], k=1 Small repeated array
[1,2,3,1,2,3], k=2 Multiple duplicate groups
[1,2,3,4,5], k=100 Very large k
[1,1,2,2,3,3], k=1 Multiple duplicate pairs

Edge Cases

One important edge case occurs when all elements are unique. In this situation, every subarray has trimmed length zero because no value appears more than once. A buggy implementation might incorrectly count frequencies or accidentally include unique elements in the trimmed length. The current implementation handles this correctly because values only contribute when their frequency reaches at least two.

Another important case is when every element is identical. For example, [1,1,1,1] causes the trimmed length to grow rapidly. Incorrect implementations often mishandle the transition from frequency 1 to frequency 2. The algorithm explicitly adds 2 when a value reaches frequency 2, ensuring both copies are counted correctly.

Large k values are also critical. When k is huge, splitting becomes very expensive, so the optimal answer often uses a single subarray even if duplicates exist. Greedy strategies that always try to reduce duplicates would fail here. The DP solution naturally evaluates all partition possibilities and correctly balances split cost against duplicate penalties.

A final subtle edge case involves arrays of length one. Since the only subarray contains a single unique value, the trimmed length is zero and the answer is exactly k. The implementation handles this automatically because the frequency never exceeds one.