LeetCode 1500 - Design a File Sharing System

Difficulty: 🟡 Medium
Topics: Hash Table, Design, Sorting, Heap (Priority Queue), Data Stream

Solution

Problem Understanding

This problem asks us to design a miniature peer-to-peer file sharing system. The file is divided into m chunks, where every chunk has an ID from 1 to m. Users can join the system while already owning some chunks, leave the system entirely, and request chunks from other users.

The main challenge is not the chunk transfer itself, but efficiently maintaining ownership information while also correctly managing reusable user IDs.

When a user joins, the system must assign the smallest available positive integer as their user ID. IDs are reusable, which means if a user leaves, their ID becomes available again for future users. This immediately suggests that we need a data structure capable of efficiently retrieving the minimum available ID.

The request(userID, chunkID) operation is slightly subtle. The method must return all users currently owning the requested chunk, sorted in ascending order. Additionally, if at least one user owns the chunk, then the requesting user successfully downloads it and now owns that chunk as well.

The constraints are important:

m can be as large as 10^5
There are at most 10^4 operations
Each user initially owns at most 100 chunks

These constraints tell us that we cannot repeatedly scan every user or every chunk for each operation. We need near constant-time updates and efficient retrieval of sorted owners.

Several edge cases are important:

A user may join with no chunks at all
A chunk may currently have no owners
A requesting user may already own the requested chunk
IDs must always reuse the smallest available value
Multiple users can own the same chunk
Leaving users must be completely removed from all ownership structures

A naive implementation can easily fail by forgetting to remove chunk ownership on leave, or by mishandling reusable IDs.

Approaches

Brute Force Approach

A straightforward implementation would store each user's owned chunks and, for every request, scan all active users to determine who owns the requested chunk.

For example:

Maintain a map from user ID to owned chunks
For request(userID, chunkID), iterate through every active user and check whether they own chunkID
For leave(userID), simply delete the user

This approach is correct because ownership is always derived directly from the authoritative user data.

However, it is inefficient. In the worst case, every request operation scans all users. Since there may be up to 10^4 operations, repeated full scans become expensive.

The reusable ID requirement also becomes awkward if implemented naively. Continuously searching for the smallest unused ID can take linear time per join.

Optimal Approach

The key observation is that ownership queries happen frequently, so we should index ownership by chunk instead of repeatedly searching all users.

We maintain two main relationships:

For each user, which chunks they own
For each chunk, which users currently own it

This allows:

Fast insertion/removal during joins and leaves
Fast retrieval of all owners during requests

To efficiently reuse the smallest available user ID, we use a min-heap. Whenever a user leaves, their ID is pushed back into the heap. When a new user joins, we first reuse the smallest available ID if possible. Otherwise, we allocate a new increasing ID.

We also need ownership lists returned in sorted order. Using a sorted set structure would work, but Python does not provide one natively. Instead, we use regular sets and sort only when returning results. Since there are at most 10^4 operations, this remains efficient.

Approach	Time Complexity	Space Complexity	Notes
Brute Force	O(U) per request	O(U × C)	Scans all users for ownership
Optimal	O(log U + K log K) per request	O(U × C)	Uses chunk-to-users indexing and reusable ID heap

Here:

U is the number of users
C is the average chunks per user
K is the number of owners of a chunk

Algorithm Walkthrough

Initialize three core data structures.

We maintain:

user_chunks, a map from user ID to the set of chunks they own
chunk_owners, a map from chunk ID to the set of users owning it
available_ids, a min-heap storing reusable user IDs

We also keep next_id, which represents the next brand-new ID to assign if no reusable IDs exist. 2. Handle the join operation.

When a new user joins:

If the heap contains reusable IDs, pop the smallest one
Otherwise, assign next_id and increment it

Then:

Store the user's chunks in user_chunks
For every owned chunk, insert the user into chunk_owners[chunk]

This ensures both mappings remain synchronized. 3. Handle the leave operation.

When a user leaves:

Retrieve all chunks owned by that user
Remove the user from every corresponding chunk ownership set
Delete the user from user_chunks
Push the user's ID into the reusable ID heap

This completely removes the user from the system while making the ID available again. 4. Handle the request operation.

For a requested chunk:

Retrieve all owners from chunk_owners[chunkID]
Sort them before returning

If the owner list is non-empty:

Add the chunk to the requesting user's owned chunks
Add the requesting user to the chunk's owner set

This models successful downloading. 5. Return the sorted owner list.

The problem explicitly requires ascending order, so sorting occurs during the request operation.

Why it works

The algorithm maintains two synchronized invariants:

Every user's owned chunks are accurately stored in user_chunks
Every chunk's owners are accurately stored in chunk_owners

Whenever ownership changes, both structures are updated together. Because IDs removed during leave are inserted into a min-heap, the smallest reusable ID is always retrieved first during join. Therefore, both ownership tracking and ID assignment remain correct throughout all operations.

Python Solution

from typing import List
from collections import defaultdict
import heapq

class FileSharing:

    def __init__(self, m: int):
        self.user_chunks = {}
        self.chunk_owners = defaultdict(set)

        self.available_ids = []
        self.next_id = 1

    def join(self, ownedChunks: List[int]) -> int:
        if self.available_ids:
            user_id = heapq.heappop(self.available_ids)
        else:
            user_id = self.next_id
            self.next_id += 1

        self.user_chunks[user_id] = set(ownedChunks)

        for chunk in ownedChunks:
            self.chunk_owners[chunk].add(user_id)

        return user_id

    def leave(self, userID: int) -> None:
        owned_chunks = self.user_chunks[userID]

        for chunk in owned_chunks:
            self.chunk_owners[chunk].discard(userID)

        del self.user_chunks[userID]

        heapq.heappush(self.available_ids, userID)

    def request(self, userID: int, chunkID: int) -> List[int]:
        owners = sorted(self.chunk_owners[chunkID])

        if owners:
            if chunkID not in self.user_chunks[userID]:
                self.user_chunks[userID].add(chunkID)
                self.chunk_owners[chunkID].add(userID)

        return owners

# Your FileSharing object will be instantiated and called as such:
# obj = FileSharing(m)
# param_1 = obj.join(ownedChunks)
# obj.leave(userID)
# param_3 = obj.request(userID, chunkID)

The implementation closely follows the algorithm described earlier.

The constructor initializes all state tracking structures. user_chunks stores ownership from the user perspective, while chunk_owners stores ownership from the chunk perspective. The min-heap available_ids manages reusable IDs efficiently.

The join method first determines which ID to assign. If previously freed IDs exist, the smallest one is reused via heapq.heappop. Otherwise, a fresh ID is generated from next_id.

The leave method iterates through every chunk owned by the departing user and removes the user from each ownership set. After cleanup, the user ID is returned to the reusable pool.

The request method retrieves all owners for the requested chunk and sorts them before returning. If ownership exists, the requester successfully downloads the chunk, so ownership is added to both tracking structures.

Using sets prevents duplicate ownership entries automatically.

Go Solution

package main

import (
	"container/heap"
	"sort"
)

type MinHeap []int

func (h MinHeap) Len() int {
	return len(h)
}

func (h MinHeap) Less(i, j int) bool {
	return h[i] < h[j]
}

func (h MinHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MinHeap) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *MinHeap) Pop() interface{} {
	old := *h
	n := len(old)
	val := old[n-1]
	*h = old[:n-1]
	return val
}

type FileSharing struct {
	userChunks   map[int]map[int]bool
	chunkOwners  map[int]map[int]bool
	availableIDs MinHeap
	nextID       int
}

func Constructor(m int) FileSharing {
	h := MinHeap{}
	heap.Init(&h)

	return FileSharing{
		userChunks:   make(map[int]map[int]bool),
		chunkOwners:  make(map[int]map[int]bool),
		availableIDs: h,
		nextID:       1,
	}
}

func (this *FileSharing) Join(ownedChunks []int) int {
	var userID int

	if this.availableIDs.Len() > 0 {
		userID = heap.Pop(&this.availableIDs).(int)
	} else {
		userID = this.nextID
		this.nextID++
	}

	this.userChunks[userID] = make(map[int]bool)

	for _, chunk := range ownedChunks {
		this.userChunks[userID][chunk] = true

		if _, exists := this.chunkOwners[chunk]; !exists {
			this.chunkOwners[chunk] = make(map[int]bool)
		}

		this.chunkOwners[chunk][userID] = true
	}

	return userID
}

func (this *FileSharing) Leave(userID int) {
	for chunk := range this.userChunks[userID] {
		delete(this.chunkOwners[chunk], userID)
	}

	delete(this.userChunks, userID)

	heap.Push(&this.availableIDs, userID)
}

func (this *FileSharing) Request(userID int, chunkID int) []int {
	ownersMap, exists := this.chunkOwners[chunkID]

	if !exists || len(ownersMap) == 0 {
		return []int{}
	}

	owners := []int{}

	for owner := range ownersMap {
		owners = append(owners, owner)
	}

	sort.Ints(owners)

	if !this.userChunks[userID][chunkID] {
		this.userChunks[userID][chunkID] = true
		this.chunkOwners[chunkID][userID] = true
	}

	return owners
}

/**
 * Your FileSharing object will be instantiated and called as such:
 * obj := Constructor(m);
 * param_1 := obj.Join(ownedChunks);
 * obj.Leave(userID);
 * param_3 := obj.Request(userID,chunkID);
 */

The Go version mirrors the Python implementation closely, but Go requires more explicit data structure management.

Since Go does not provide a built-in heap for integers, we implement the heap.Interface manually using container/heap.

Python sets are represented using map[int]bool in Go. This is the idiomatic way to model hash sets in Go.

Unlike Python, Go distinguishes between nil maps and initialized maps, so we must explicitly allocate maps before insertion.

Worked Examples

Example 1

Input:

["FileSharing","join","join","join","request","request","leave","request","leave","join"]
[[4],[[1,2]],[[2,3]],[[4]],[1,3],[2,2],[1],[2,1],[2],[[]]]

Initial state:

Variable	Value
available_ids	[]
next_id	1
user_chunks	{}
chunk_owners	{}

Step 1: join([1,2])

Assign ID 1.

Structure	State
user_chunks	{1: {1,2}}
chunk_owners	{1: {1}, 2: {1}}

Return 1.

Step 2: join([2,3])

Assign ID 2.

Structure	State
user_chunks	{1:{1,2}, 2:{2,3}}
chunk_owners	{1:{1}, 2:{1,2}, 3:{2}}

Return 2.

Step 3: join([4])

Assign ID 3.

Structure	State
user_chunks	{1:{1,2}, 2:{2,3}, 3:{4}}
chunk_owners	{1:{1}, 2:{1,2}, 3:{2}, 4:{3}}

Return 3.

Step 4: request(1,3)

Owners of chunk 3 are [2].

User 1 downloads chunk 3.

Structure	State
user_chunks	{1:{1,2,3}, 2:{2,3}, 3:{4}}
chunk_owners	{1:{1}, 2:{1,2}, 3:{1,2}, 4:{3}}

Return [2].

Step 5: request(2,2)

Owners of chunk 2 are [1,2].

User 2 already owns chunk 2, no change needed.

Return [1,2].

Step 6: leave(1)

User 1 leaves.

Structure	State
available_ids	[1]
user_chunks	{2:{2,3}, 3:{4}}
chunk_owners	{1:{}, 2:{2}, 3:{2}, 4:{3}}

Step 7: request(2,1)

Chunk 1 has no owners.

Return [].

Step 8: leave(2)

User 2 leaves.

Structure	State
available_ids	[1,2]
user_chunks	{3:{4}}

Step 9: join([])

Reuse smallest available ID, which is 1.

Return 1.

Complexity Analysis

Measure	Complexity	Explanation
Time	O(log U + K log K) per request	Heap operations plus sorting owners
Space	O(U × C)	Stores ownership relationships

The join and leave operations are dominated by heap operations and updating ownership sets.

The request operation must return owners in sorted order, which requires sorting K owners. This sorting cost is unavoidable unless we maintain permanently sorted structures, which would make updates more expensive.

The total space complexity comes from storing ownership relationships in both directions:

User to chunks
Chunk to users

Test Cases

from typing import List
from collections import defaultdict
import heapq

class FileSharing:

    def __init__(self, m: int):
        self.user_chunks = {}
        self.chunk_owners = defaultdict(set)

        self.available_ids = []
        self.next_id = 1

    def join(self, ownedChunks: List[int]) -> int:
        if self.available_ids:
            user_id = heapq.heappop(self.available_ids)
        else:
            user_id = self.next_id
            self.next_id += 1

        self.user_chunks[user_id] = set(ownedChunks)

        for chunk in ownedChunks:
            self.chunk_owners[chunk].add(user_id)

        return user_id

    def leave(self, userID: int) -> None:
        for chunk in self.user_chunks[userID]:
            self.chunk_owners[chunk].discard(userID)

        del self.user_chunks[userID]

        heapq.heappush(self.available_ids, userID)

    def request(self, userID: int, chunkID: int) -> List[int]:
        owners = sorted(self.chunk_owners[chunkID])

        if owners:
            self.user_chunks[userID].add(chunkID)
            self.chunk_owners[chunkID].add(userID)

        return owners

# Provided example
fs = FileSharing(4)
assert fs.join([1, 2]) == 1          # first user gets ID 1
assert fs.join([2, 3]) == 2          # second user gets ID 2
assert fs.join([4]) == 3             # third user gets ID 3
assert fs.request(1, 3) == [2]       # user 1 downloads chunk 3
assert fs.request(2, 2) == [1, 2]    # multiple owners
fs.leave(1)
assert fs.request(2, 1) == []        # no owners remain
fs.leave(2)
assert fs.join([]) == 1              # smallest reusable ID

# User joins with no chunks
fs = FileSharing(5)
assert fs.join([]) == 1              # empty ownership allowed

# Reusing multiple IDs
fs = FileSharing(5)
u1 = fs.join([1])
u2 = fs.join([2])
u3 = fs.join([3])

fs.leave(u2)
fs.leave(u1)

assert fs.join([]) == 1              # smallest ID reused first
assert fs.join([]) == 2              # next smallest reused

# Requesting unavailable chunk
fs = FileSharing(3)
u1 = fs.join([1])
u2 = fs.join([])

assert fs.request(u2, 2) == []       # chunk has no owners

# Request adds ownership
fs = FileSharing(3)
u1 = fs.join([1])
u2 = fs.join([])

assert fs.request(u2, 1) == [1]
assert fs.request(u2, 1) == [1, 2]   # user 2 now owns chunk 1

# Leave removes ownership completely
fs = FileSharing(3)
u1 = fs.join([1, 2])
u2 = fs.join([])

fs.leave(u1)

assert fs.request(u2, 1) == []       # ownership removed correctly
assert fs.request(u2, 2) == []       # all chunks removed

# Large chunk IDs
fs = FileSharing(100000)
u1 = fs.join([100000])

assert fs.request(u1, 100000) == [1] # handles max chunk ID

Test	Why
Provided example	Validates all core operations together
Empty chunk join	Ensures users may join without ownership
Multiple ID reuse	Verifies min-heap reusable ID logic
Request unavailable chunk	Confirms empty results are handled
Request adds ownership	Ensures successful downloads persist
Leave removes ownership	Prevents stale ownership references
Maximum chunk ID	Confirms large chunk indexing works

Edge Cases

One important edge case occurs when a user joins with no chunks. A careless implementation might assume every user owns at least one chunk and fail to initialize ownership structures correctly. In this solution, the user's chunk set is initialized regardless of size, so empty ownership is handled naturally.

Another critical edge case involves reusable IDs. Suppose users 1 and 2 leave in that order. The next joining user must receive ID 1, not 3. A naive increment-only system would violate the problem requirements. The min-heap guarantees that the smallest available ID is always reused first.

A third subtle edge case happens during repeated requests for the same chunk. If a requesting user already owns the chunk, we must avoid duplicate ownership entries. Using sets for both user ownership and chunk ownership automatically prevents duplicates and keeps the data consistent.

Finally, leaving users must be fully removed from every chunk ownership mapping. Forgetting this cleanup would cause stale users to appear in future request results. The leave implementation explicitly iterates through all owned chunks and removes the user from every corresponding ownership set.