LeetCode 1233 - Remove Sub-Folders from the Filesystem
This problem asks us to process a list of folder paths and remove all sub-folders, returning only the top-level folders. A folder "/a/b" is considered a sub-folder of "/a" because it is nested inside it.
Difficulty: 🟡 Medium
Topics: Array, String, Depth-First Search, Trie
Solution
Problem Understanding
This problem asks us to process a list of folder paths and remove all sub-folders, returning only the top-level folders. A folder "/a/b" is considered a sub-folder of "/a" because it is nested inside it. The key detail is that a sub-folder must start with the parent folder path, followed by a '/'. This ensures we avoid false positives where folder names might partially match, for example "/ab" is not a sub-folder of "/a".
The input is a list of unique folder strings where each string represents a valid Unix-style path starting with '/' and composed of lowercase letters. The expected output is a list of folders where no folder is a sub-folder of another, and the order of folders does not matter.
Constraints tell us the input can be large, up to 40,000 folders, each up to 100 characters long. This suggests that naive pairwise comparisons (O(n²)) may be too slow, and we need a more structured approach. Edge cases include folders with common prefixes but no nesting, deeply nested folders, and folders with the same parent but different children.
Approaches
The brute-force approach would involve comparing each folder with every other folder to check if it is a sub-folder. This is straightforward but inefficient. For every folder, we would scan the remaining folders to see if the folder path starts with the current folder plus a '/'. This approach has a time complexity of O(n² * k) where n is the number of folders and k is the average length of a folder string, which is not suitable for the upper bounds of the input.
The optimal approach relies on sorting the folders and then using a prefix check to eliminate sub-folders. If we sort the folders lexicographically, all sub-folders will appear directly after their parent folder. We can then iterate through the sorted list and maintain the last added top-level folder. For each new folder, if it starts with the previous folder plus '/', it is a sub-folder and skipped; otherwise, it is added to the result list. This approach is efficient because sorting groups potential sub-folders together and reduces the need for repeated comparisons.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(n² * k) | O(n) | Compare each folder with every other folder to check prefix |
| Optimal | O(n * k log n) | O(n * k) | Sort folders and remove sub-folders by prefix comparison |
Algorithm Walkthrough
- Sort the folders lexicographically. This ensures that any sub-folder comes immediately after its parent folder.
- Initialize a result list to hold the top-level folders and a variable
prevto keep track of the last added folder. - Iterate through the sorted folders:
- For each folder, check if it starts with
prev + '/'. - If it does, it is a sub-folder and skipped.
- If it does not, append the folder to the result list and update
prevto this folder.
- Return the result list containing all top-level folders.
Why it works: Sorting guarantees that all sub-folders immediately follow their parent. By maintaining the last added top-level folder (prev), we can efficiently check whether the current folder is nested without scanning the entire list, preserving correctness while reducing time complexity.
Python Solution
from typing import List
class Solution:
def removeSubfolders(self, folder: List[str]) -> List[str]:
folder.sort()
result = []
prev = ""
for f in folder:
if not prev or not f.startswith(prev + "/"):
result.append(f)
prev = f
return result
In this implementation, the folders are first sorted to group potential sub-folders with their parent. The prev variable stores the last top-level folder added to result. For each folder, we check if it starts with prev + "/". If it does not, it is a new top-level folder and added to the result. This ensures no sub-folder is added.
Go Solution
import (
"sort"
"strings"
)
func removeSubfolders(folder []string) []string {
sort.Strings(folder)
result := []string{}
prev := ""
for _, f := range folder {
if prev == "" || !strings.HasPrefix(f, prev+"/") {
result = append(result, f)
prev = f
}
}
return result
}
The Go implementation mirrors the Python version. Sorting is done using sort.Strings. The strings.HasPrefix function checks if a folder is a sub-folder of prev. The prev variable tracks the last added top-level folder, and the result slice accumulates the final folders.
Worked Examples
Example 1: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]
After sorting: ["/a", "/a/b", "/c/d", "/c/d/e", "/c/f"]
Iteration:
| Folder | prev | Action | result |
|---|---|---|---|
| /a | "" | add | ["/a"] |
| /a/b | "/a" | skip | ["/a"] |
| /c/d | "/a" | add | ["/a", "/c/d"] |
| /c/d/e | "/c/d" | skip | ["/a", "/c/d"] |
| /c/f | "/c/d" | add | ["/a", "/c/d", "/c/f"] |
Output: ["/a", "/c/d", "/c/f"]
Example 2: folder = ["/a","/a/b/c","/a/b/d"]
Sorted: ["/a", "/a/b/c", "/a/b/d"]
Iteration: only "/a" is added, others are skipped.
Example 3: folder = ["/a/b/c","/a/b/ca","/a/b/d"]
Sorted: ["/a/b/c","/a/b/ca","/a/b/d"]
Iteration: none are sub-folders of each other, all added.
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(n * k log n) | Sorting takes O(n log n), prefix checks are O(k) per folder |
| Space | O(n * k) | Result list stores up to n folders, each of length k |
Sorting dominates the time complexity. Prefix checking is linear in the number of folders and average folder length.
Test Cases
# Provided examples
assert Solution().removeSubfolders(["/a","/a/b","/c/d","/c/d/e","/c/f"]) == ["/a","/c/d","/c/f"]
assert Solution().removeSubfolders(["/a","/a/b/c","/a/b/d"]) == ["/a"]
assert Solution().removeSubfolders(["/a/b/c","/a/b/ca","/a/b/d"]) == ["/a/b/c","/a/b/ca","/a/b/d"]
# Edge cases
assert Solution().removeSubfolders(["/a"]) == ["/a"] # Single folder
assert Solution().removeSubfolders(["/a","/ab"]) == ["/a","/ab"] # Similar prefix but not sub-folder
assert Solution().removeSubfolders(["/a","/a/b","/a/b/c","/b"]) == ["/a","/b"] # Multiple top-level folders
| Test | Why |
|---|---|
| Single folder | Minimal input edge case |
| Similar prefix | Ensure no false positive sub-folder removal |
| Multiple top-level folders | Checks algorithm handles multiple parent folders correctly |
Edge Cases
One edge case is a single folder in the input. The algorithm handles it naturally because sorting a single item and iterating adds it to the result. Another edge case is folders with similar prefixes but not true sub-folders, e.g., "/a" and "/ab". The algorithm correctly distinguishes them because it checks for prev + "/". Lastly, deeply nested folders, e.g., "/a/b/c/d", are handled correctly because only the top-level folder is stored, and all nested sub-folders are skipped during iteration. The combination of sorting and prefix checking ensures correctness across these scenarios.