LeetCode 1500 - Design a File Sharing System

This problem asks us to design a miniature peer-to-peer file sharing system. The file is divided into m chunks, where every chunk has an ID from 1 to m. Users can join the system while already owning some chunks, leave the system entirely, and request chunks from other users.

LeetCode Problem 1500

Difficulty: 🟡 Medium
Topics: Hash Table, Design, Sorting, Heap (Priority Queue), Data Stream

Solution

Problem Understanding

This problem asks us to design a miniature peer-to-peer file sharing system. The file is divided into m chunks, where every chunk has an ID from 1 to m. Users can join the system while already owning some chunks, leave the system entirely, and request chunks from other users.

The main challenge is not the chunk transfer itself, but efficiently maintaining ownership information while also correctly managing reusable user IDs.

When a user joins, the system must assign the smallest available positive integer as their user ID. IDs are reusable, which means if a user leaves, their ID becomes available again for future users. This immediately suggests that we need a data structure capable of efficiently retrieving the minimum available ID.

The request(userID, chunkID) operation is slightly subtle. The method must return all users currently owning the requested chunk, sorted in ascending order. Additionally, if at least one user owns the chunk, then the requesting user successfully downloads it and now owns that chunk as well.

The constraints are important:

  • m can be as large as 10^5
  • There are at most 10^4 operations
  • Each user initially owns at most 100 chunks

These constraints tell us that we cannot repeatedly scan every user or every chunk for each operation. We need near constant-time updates and efficient retrieval of sorted owners.

Several edge cases are important:

  • A user may join with no chunks at all
  • A chunk may currently have no owners
  • A requesting user may already own the requested chunk
  • IDs must always reuse the smallest available value
  • Multiple users can own the same chunk
  • Leaving users must be completely removed from all ownership structures

A naive implementation can easily fail by forgetting to remove chunk ownership on leave, or by mishandling reusable IDs.

Approaches

Brute Force Approach

A straightforward implementation would store each user's owned chunks and, for every request, scan all active users to determine who owns the requested chunk.

For example:

  • Maintain a map from user ID to owned chunks
  • For request(userID, chunkID), iterate through every active user and check whether they own chunkID
  • For leave(userID), simply delete the user

This approach is correct because ownership is always derived directly from the authoritative user data.

However, it is inefficient. In the worst case, every request operation scans all users. Since there may be up to 10^4 operations, repeated full scans become expensive.

The reusable ID requirement also becomes awkward if implemented naively. Continuously searching for the smallest unused ID can take linear time per join.

Optimal Approach

The key observation is that ownership queries happen frequently, so we should index ownership by chunk instead of repeatedly searching all users.

We maintain two main relationships:

  • For each user, which chunks they own
  • For each chunk, which users currently own it

This allows:

  • Fast insertion/removal during joins and leaves
  • Fast retrieval of all owners during requests

To efficiently reuse the smallest available user ID, we use a min-heap. Whenever a user leaves, their ID is pushed back into the heap. When a new user joins, we first reuse the smallest available ID if possible. Otherwise, we allocate a new increasing ID.

We also need ownership lists returned in sorted order. Using a sorted set structure would work, but Python does not provide one natively. Instead, we use regular sets and sort only when returning results. Since there are at most 10^4 operations, this remains efficient.

Approach Time Complexity Space Complexity Notes
Brute Force O(U) per request O(U × C) Scans all users for ownership
Optimal O(log U + K log K) per request O(U × C) Uses chunk-to-users indexing and reusable ID heap

Here:

  • U is the number of users
  • C is the average chunks per user
  • K is the number of owners of a chunk

Algorithm Walkthrough

  1. Initialize three core data structures.

We maintain:

  • user_chunks, a map from user ID to the set of chunks they own
  • chunk_owners, a map from chunk ID to the set of users owning it
  • available_ids, a min-heap storing reusable user IDs

We also keep next_id, which represents the next brand-new ID to assign if no reusable IDs exist. 2. Handle the join operation.

When a new user joins:

  • If the heap contains reusable IDs, pop the smallest one
  • Otherwise, assign next_id and increment it

Then:

  • Store the user's chunks in user_chunks
  • For every owned chunk, insert the user into chunk_owners[chunk]

This ensures both mappings remain synchronized. 3. Handle the leave operation.

When a user leaves:

  • Retrieve all chunks owned by that user
  • Remove the user from every corresponding chunk ownership set
  • Delete the user from user_chunks
  • Push the user's ID into the reusable ID heap

This completely removes the user from the system while making the ID available again. 4. Handle the request operation.

For a requested chunk:

  • Retrieve all owners from chunk_owners[chunkID]
  • Sort them before returning

If the owner list is non-empty:

  • Add the chunk to the requesting user's owned chunks
  • Add the requesting user to the chunk's owner set

This models successful downloading. 5. Return the sorted owner list.

The problem explicitly requires ascending order, so sorting occurs during the request operation.

Why it works

The algorithm maintains two synchronized invariants:

  • Every user's owned chunks are accurately stored in user_chunks
  • Every chunk's owners are accurately stored in chunk_owners

Whenever ownership changes, both structures are updated together. Because IDs removed during leave are inserted into a min-heap, the smallest reusable ID is always retrieved first during join. Therefore, both ownership tracking and ID assignment remain correct throughout all operations.

Python Solution

from typing import List
from collections import defaultdict
import heapq

class FileSharing:

    def __init__(self, m: int):
        self.user_chunks = {}
        self.chunk_owners = defaultdict(set)

        self.available_ids = []
        self.next_id = 1

    def join(self, ownedChunks: List[int]) -> int:
        if self.available_ids:
            user_id = heapq.heappop(self.available_ids)
        else:
            user_id = self.next_id
            self.next_id += 1

        self.user_chunks[user_id] = set(ownedChunks)

        for chunk in ownedChunks:
            self.chunk_owners[chunk].add(user_id)

        return user_id

    def leave(self, userID: int) -> None:
        owned_chunks = self.user_chunks[userID]

        for chunk in owned_chunks:
            self.chunk_owners[chunk].discard(userID)

        del self.user_chunks[userID]

        heapq.heappush(self.available_ids, userID)

    def request(self, userID: int, chunkID: int) -> List[int]:
        owners = sorted(self.chunk_owners[chunkID])

        if owners:
            if chunkID not in self.user_chunks[userID]:
                self.user_chunks[userID].add(chunkID)
                self.chunk_owners[chunkID].add(userID)

        return owners

# Your FileSharing object will be instantiated and called as such:
# obj = FileSharing(m)
# param_1 = obj.join(ownedChunks)
# obj.leave(userID)
# param_3 = obj.request(userID, chunkID)

The implementation closely follows the algorithm described earlier.

The constructor initializes all state tracking structures. user_chunks stores ownership from the user perspective, while chunk_owners stores ownership from the chunk perspective. The min-heap available_ids manages reusable IDs efficiently.

The join method first determines which ID to assign. If previously freed IDs exist, the smallest one is reused via heapq.heappop. Otherwise, a fresh ID is generated from next_id.

The leave method iterates through every chunk owned by the departing user and removes the user from each ownership set. After cleanup, the user ID is returned to the reusable pool.

The request method retrieves all owners for the requested chunk and sorts them before returning. If ownership exists, the requester successfully downloads the chunk, so ownership is added to both tracking structures.

Using sets prevents duplicate ownership entries automatically.

Go Solution

package main

import (
	"container/heap"
	"sort"
)

type MinHeap []int

func (h MinHeap) Len() int {
	return len(h)
}

func (h MinHeap) Less(i, j int) bool {
	return h[i] < h[j]
}

func (h MinHeap) Swap(i, j int) {
	h[i], h[j] = h[j], h[i]
}

func (h *MinHeap) Push(x interface{}) {
	*h = append(*h, x.(int))
}

func (h *MinHeap) Pop() interface{} {
	old := *h
	n := len(old)
	val := old[n-1]
	*h = old[:n-1]
	return val
}

type FileSharing struct {
	userChunks   map[int]map[int]bool
	chunkOwners  map[int]map[int]bool
	availableIDs MinHeap
	nextID       int
}

func Constructor(m int) FileSharing {
	h := MinHeap{}
	heap.Init(&h)

	return FileSharing{
		userChunks:   make(map[int]map[int]bool),
		chunkOwners:  make(map[int]map[int]bool),
		availableIDs: h,
		nextID:       1,
	}
}

func (this *FileSharing) Join(ownedChunks []int) int {
	var userID int

	if this.availableIDs.Len() > 0 {
		userID = heap.Pop(&this.availableIDs).(int)
	} else {
		userID = this.nextID
		this.nextID++
	}

	this.userChunks[userID] = make(map[int]bool)

	for _, chunk := range ownedChunks {
		this.userChunks[userID][chunk] = true

		if _, exists := this.chunkOwners[chunk]; !exists {
			this.chunkOwners[chunk] = make(map[int]bool)
		}

		this.chunkOwners[chunk][userID] = true
	}

	return userID
}

func (this *FileSharing) Leave(userID int) {
	for chunk := range this.userChunks[userID] {
		delete(this.chunkOwners[chunk], userID)
	}

	delete(this.userChunks, userID)

	heap.Push(&this.availableIDs, userID)
}

func (this *FileSharing) Request(userID int, chunkID int) []int {
	ownersMap, exists := this.chunkOwners[chunkID]

	if !exists || len(ownersMap) == 0 {
		return []int{}
	}

	owners := []int{}

	for owner := range ownersMap {
		owners = append(owners, owner)
	}

	sort.Ints(owners)

	if !this.userChunks[userID][chunkID] {
		this.userChunks[userID][chunkID] = true
		this.chunkOwners[chunkID][userID] = true
	}

	return owners
}

/**
 * Your FileSharing object will be instantiated and called as such:
 * obj := Constructor(m);
 * param_1 := obj.Join(ownedChunks);
 * obj.Leave(userID);
 * param_3 := obj.Request(userID,chunkID);
 */

The Go version mirrors the Python implementation closely, but Go requires more explicit data structure management.

Since Go does not provide a built-in heap for integers, we implement the heap.Interface manually using container/heap.

Python sets are represented using map[int]bool in Go. This is the idiomatic way to model hash sets in Go.

Unlike Python, Go distinguishes between nil maps and initialized maps, so we must explicitly allocate maps before insertion.

Worked Examples

Example 1

Input:

["FileSharing","join","join","join","request","request","leave","request","leave","join"]
[[4],[[1,2]],[[2,3]],[[4]],[1,3],[2,2],[1],[2,1],[2],[[]]]

Initial state:

Variable Value
available_ids []
next_id 1
user_chunks {}
chunk_owners {}

Step 1: join([1,2])

Assign ID 1.

Structure State
user_chunks {1: {1,2}}
chunk_owners {1: {1}, 2: {1}}

Return 1.

Step 2: join([2,3])

Assign ID 2.

Structure State
user_chunks {1:{1,2}, 2:{2,3}}
chunk_owners {1:{1}, 2:{1,2}, 3:{2}}

Return 2.

Step 3: join([4])

Assign ID 3.

Structure State
user_chunks {1:{1,2}, 2:{2,3}, 3:{4}}
chunk_owners {1:{1}, 2:{1,2}, 3:{2}, 4:{3}}

Return 3.

Step 4: request(1,3)

Owners of chunk 3 are [2].

User 1 downloads chunk 3.

Structure State
user_chunks {1:{1,2,3}, 2:{2,3}, 3:{4}}
chunk_owners {1:{1}, 2:{1,2}, 3:{1,2}, 4:{3}}

Return [2].

Step 5: request(2,2)

Owners of chunk 2 are [1,2].

User 2 already owns chunk 2, no change needed.

Return [1,2].

Step 6: leave(1)

User 1 leaves.

Structure State
available_ids [1]
user_chunks {2:{2,3}, 3:{4}}
chunk_owners {1:{}, 2:{2}, 3:{2}, 4:{3}}

Step 7: request(2,1)

Chunk 1 has no owners.

Return [].

Step 8: leave(2)

User 2 leaves.

Structure State
available_ids [1,2]
user_chunks {3:{4}}

Step 9: join([])

Reuse smallest available ID, which is 1.

Return 1.

Complexity Analysis

Measure Complexity Explanation
Time O(log U + K log K) per request Heap operations plus sorting owners
Space O(U × C) Stores ownership relationships

The join and leave operations are dominated by heap operations and updating ownership sets.

The request operation must return owners in sorted order, which requires sorting K owners. This sorting cost is unavoidable unless we maintain permanently sorted structures, which would make updates more expensive.

The total space complexity comes from storing ownership relationships in both directions:

  • User to chunks
  • Chunk to users

Test Cases

from typing import List
from collections import defaultdict
import heapq

class FileSharing:

    def __init__(self, m: int):
        self.user_chunks = {}
        self.chunk_owners = defaultdict(set)

        self.available_ids = []
        self.next_id = 1

    def join(self, ownedChunks: List[int]) -> int:
        if self.available_ids:
            user_id = heapq.heappop(self.available_ids)
        else:
            user_id = self.next_id
            self.next_id += 1

        self.user_chunks[user_id] = set(ownedChunks)

        for chunk in ownedChunks:
            self.chunk_owners[chunk].add(user_id)

        return user_id

    def leave(self, userID: int) -> None:
        for chunk in self.user_chunks[userID]:
            self.chunk_owners[chunk].discard(userID)

        del self.user_chunks[userID]

        heapq.heappush(self.available_ids, userID)

    def request(self, userID: int, chunkID: int) -> List[int]:
        owners = sorted(self.chunk_owners[chunkID])

        if owners:
            self.user_chunks[userID].add(chunkID)
            self.chunk_owners[chunkID].add(userID)

        return owners

# Provided example
fs = FileSharing(4)
assert fs.join([1, 2]) == 1          # first user gets ID 1
assert fs.join([2, 3]) == 2          # second user gets ID 2
assert fs.join([4]) == 3             # third user gets ID 3
assert fs.request(1, 3) == [2]       # user 1 downloads chunk 3
assert fs.request(2, 2) == [1, 2]    # multiple owners
fs.leave(1)
assert fs.request(2, 1) == []        # no owners remain
fs.leave(2)
assert fs.join([]) == 1              # smallest reusable ID

# User joins with no chunks
fs = FileSharing(5)
assert fs.join([]) == 1              # empty ownership allowed

# Reusing multiple IDs
fs = FileSharing(5)
u1 = fs.join([1])
u2 = fs.join([2])
u3 = fs.join([3])

fs.leave(u2)
fs.leave(u1)

assert fs.join([]) == 1              # smallest ID reused first
assert fs.join([]) == 2              # next smallest reused

# Requesting unavailable chunk
fs = FileSharing(3)
u1 = fs.join([1])
u2 = fs.join([])

assert fs.request(u2, 2) == []       # chunk has no owners

# Request adds ownership
fs = FileSharing(3)
u1 = fs.join([1])
u2 = fs.join([])

assert fs.request(u2, 1) == [1]
assert fs.request(u2, 1) == [1, 2]   # user 2 now owns chunk 1

# Leave removes ownership completely
fs = FileSharing(3)
u1 = fs.join([1, 2])
u2 = fs.join([])

fs.leave(u1)

assert fs.request(u2, 1) == []       # ownership removed correctly
assert fs.request(u2, 2) == []       # all chunks removed

# Large chunk IDs
fs = FileSharing(100000)
u1 = fs.join([100000])

assert fs.request(u1, 100000) == [1] # handles max chunk ID
Test Why
Provided example Validates all core operations together
Empty chunk join Ensures users may join without ownership
Multiple ID reuse Verifies min-heap reusable ID logic
Request unavailable chunk Confirms empty results are handled
Request adds ownership Ensures successful downloads persist
Leave removes ownership Prevents stale ownership references
Maximum chunk ID Confirms large chunk indexing works

Edge Cases

One important edge case occurs when a user joins with no chunks. A careless implementation might assume every user owns at least one chunk and fail to initialize ownership structures correctly. In this solution, the user's chunk set is initialized regardless of size, so empty ownership is handled naturally.

Another critical edge case involves reusable IDs. Suppose users 1 and 2 leave in that order. The next joining user must receive ID 1, not 3. A naive increment-only system would violate the problem requirements. The min-heap guarantees that the smallest available ID is always reused first.

A third subtle edge case happens during repeated requests for the same chunk. If a requesting user already owns the chunk, we must avoid duplicate ownership entries. Using sets for both user ownership and chunk ownership automatically prevents duplicates and keeps the data consistent.

Finally, leaving users must be fully removed from every chunk ownership mapping. Forgetting this cleanup would cause stale users to appear in future request results. The leave implementation explicitly iterates through all owned chunks and removes the user from every corresponding ownership set.