LeetCode 535 - Encode and Decode TinyURL
The problem asks us to implement a URL shortening service, similar to TinyURL. Given a long URL, our system should generate a short, unique URL that maps back to the original URL. When the short URL is accessed, the system should return the original long URL.
Difficulty: 🟡 Medium
Topics: Hash Table, String, Design, Hash Function
Solution
Problem Understanding
The problem asks us to implement a URL shortening service, similar to TinyURL. Given a long URL, our system should generate a short, unique URL that maps back to the original URL. When the short URL is accessed, the system should return the original long URL. Essentially, we need a reversible mapping between long URLs and short URLs.
The input is always a valid URL with length between 1 and 10,000 characters. The output is a tiny URL that can be decoded back to the original URL. There is no requirement for the short URL to be human-readable, globally unique across multiple instances, or deterministic. The key is that each encoded URL can be decoded correctly, which allows us to focus on an internal mapping rather than hash collisions across systems.
Important edge cases include repeated encoding of the same URL, extremely long URLs, and handling multiple distinct URLs that might be encoded into the same short URL if using naive hashing. The problem guarantees that the decode function will only be called with a URL previously generated by encode.
Approaches
A brute-force approach would be to assign a unique identifier for each URL, such as a sequential integer or a direct hash. We could store the mapping in a hash table (dictionary in Python or map in Go). Each time we encode, we append the ID to a fixed domain, such as http://tinyurl.com/{id}. This is correct because the hash table ensures a bijection between long URLs and short URLs.
A more optimal solution avoids collisions by generating a short, fixed-length alphanumeric string for each URL. We can do this using a base62 encoding of a counter or a random string. Using base62 ensures that the short URL is compact while allowing a large number of unique URLs. The hash table still stores the mapping for decoding.
| Approach | Time Complexity | Space Complexity | Notes |
|---|---|---|---|
| Brute Force | O(1) per encode/decode | O(n) | Simple counter or hash table mapping, collision-free if counter used |
| Optimal | O(1) per encode/decode | O(n) | Uses hash map and random/base62 short keys for compact URLs, minimal collisions |
Algorithm Walkthrough
- Initialize a hash map to store the mapping from short keys to long URLs.
- Initialize a counter to generate unique IDs or use a random string generator.
- For encoding, generate a unique short key (either increment a counter or use a random string). Store the mapping of this key to the long URL in the hash map.
- Construct the short URL by appending the key to a base domain, e.g.,
http://tinyurl.com/{key}. - For decoding, extract the key from the short URL.
- Look up the key in the hash map to retrieve the original long URL.
Why it works: By storing a mapping from short keys to long URLs, we ensure that every encoded URL can be decoded exactly once. The hash map provides O(1) access for both encoding and decoding, and the generated keys are unique, so no collisions occur.
Python Solution
import random
import string
class Codec:
def __init__(self):
self.url_map = {}
self.base_url = "http://tinyurl.com/"
self.chars = string.ascii_letters + string.digits
def _generate_key(self, length: int = 6) -> str:
"""Generates a random 6-character alphanumeric string"""
return ''.join(random.choice(self.chars) for _ in range(length))
def encode(self, longUrl: str) -> str:
"""Encodes a URL to a shortened URL."""
key = self._generate_key()
# Ensure uniqueness
while key in self.url_map:
key = self._generate_key()
self.url_map[key] = longUrl
return self.base_url + key
def decode(self, shortUrl: str) -> str:
"""Decodes a shortened URL to its original URL."""
key = shortUrl.replace(self.base_url, "")
return self.url_map.get(key, "")
The Python implementation uses a hash map (url_map) to store the mapping from short keys to original URLs. _generate_key generates a random 6-character alphanumeric string. When encoding, we generate a unique key and store it in the map. Decoding simply extracts the key from the short URL and retrieves the corresponding long URL.
Go Solution
package main
import (
"math/rand"
"time"
)
type Codec struct {
urlMap map[string]string
baseURL string
chars string
randSrc *rand.Rand
}
func Constructor() Codec {
return Codec{
urlMap: make(map[string]string),
baseURL: "http://tinyurl.com/",
chars: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
randSrc: rand.New(rand.NewSource(time.Now().UnixNano())),
}
}
func (c *Codec) generateKey(length int) string {
key := make([]byte, length)
for i := 0; i < length; i++ {
key[i] = c.chars[c.randSrc.Intn(len(c.chars))]
}
return string(key)
}
func (c *Codec) encode(longUrl string) string {
key := c.generateKey(6)
for {
if _, exists := c.urlMap[key]; !exists {
break
}
key = c.generateKey(6)
}
c.urlMap[key] = longUrl
return c.baseURL + key
}
func (c *Codec) decode(shortUrl string) string {
key := shortUrl[len(c.baseURL):]
return c.urlMap[key]
}
In Go, we explicitly manage a random source (randSrc) to avoid reseeding issues. The logic mirrors Python: generate a random key, ensure uniqueness, store in a map, and retrieve for decoding.
Worked Examples
Example: Encoding "https://leetcode.com/problems/design-tinyurl"
| Step | longUrl | key | url_map | shortUrl |
|---|---|---|---|---|
| 1 | "https://leetcode.com/problems/design-tinyurl" | "a1B2c3" | {} | "" |
| 2 | same | "a1B2c3" | {"a1B2c3": "https://leetcode.com/problems/design-tinyurl"} | "http://tinyurl.com/a1B2c3" |
| 3 | decode "http://tinyurl.com/a1B2c3" | "a1B2c3" | {"a1B2c3": "https://leetcode.com/problems/design-tinyurl"} | "https://leetcode.com/problems/design-tinyurl" |
Complexity Analysis
| Measure | Complexity | Explanation |
|---|---|---|
| Time | O(1) | Generating a key, inserting, and retrieving from the hash map are all O(1) on average |
| Space | O(n) | Each encoded URL is stored in the map, so space grows linearly with the number of URLs |
The algorithm scales efficiently because both encoding and decoding use constant-time hash map operations, and the storage requirement is proportional to the number of unique URLs.
Test Cases
codec = Codec()
# Example test
url = "https://leetcode.com/problems/design-tinyurl"
short = codec.encode(url)
assert codec.decode(short) == url # Correct mapping
# Repeated encoding of same URL
short2 = codec.encode(url)
assert codec.decode(short2) == url # Should still map correctly
# Very long URL
long_url = "http://example.com/" + "a" * 10000
short3 = codec.encode(long_url)
assert codec.decode(short3) == long_url
# Multiple URLs
urls = ["http://site1.com", "http://site2.com", "http://site3.com"]
for u in urls:
s = codec.encode(u)
assert codec.decode(s) == u
# Short URL not in map (edge case, should return empty string)
assert codec.decode("http://tinyurl.com/xxxxxx") == ""
| Test | Why |
|---|---|
| Basic example | Validates encoding/decoding of a standard URL |
| Repeated encoding | Ensures repeated URLs do not break the mapping |
| Very long URL | Tests maximum input length handling |
| Multiple URLs | Validates multiple distinct URLs can be encoded/decoded correctly |
| Invalid short URL | Ensures the decode function handles unknown keys safely |
Edge Cases
One important edge case is repeated encoding of the same URL. If the implementation naively reuses keys or does not generate unique keys per call, repeated encoding could overwrite previous entries. In our implementation, each encoding generates a new random key, ensuring uniqueness.
Another edge case is extremely long URLs, up to 10,000 characters. Our solution stores the URL in memory without truncation or modification, so it handles long URLs correctly without overflow or corruption.
A third edge case is decoding a short URL that was never generated. While the problem guarantees that decode will only be called with valid short URLs, a robust implementation should handle invalid keys gracefully. The Python solution returns an empty string if the key is not found, and the Go solution implicitly returns the zero value of a string, which is also empty.