Design and Implementation of a Short URL Service
This article explains the concept of short URLs, compares three generation algorithms (auto‑increment ID, hash‑based, and random), analyzes their trade‑offs, and presents a complete backend design including database schema, caching strategy, redirection flow, and Java code examples for hash and base‑62 conversion.
Short URLs are compact strings that map to longer original URLs, often used to fit character limits in messages or social media posts.
The article outlines two core requirements for a short‑URL system: converting long URLs into short codes and redirecting users from the short code to the original URL.
Short‑code generation methods
Short codes are typically composed of characters from [a‑z, A‑Z, 0‑9] , giving 62 possible symbols. A common length is six characters, providing 56.8 billion combinations, sufficient for most scenarios.
Three popular generation techniques are discussed:
Auto‑increment ID : Convert an auto‑increment integer to a base‑62 string. This yields ordered, collision‑free codes but can be enumerated by attackers.
Hash (summary) algorithm : Compute an MD5 hash of the URL (optionally with a key), split it into four 8‑character segments, and derive six‑character codes from each segment. This produces fixed‑length, seemingly random codes, though collisions are still possible.
Random number : Randomly pick six characters from the 62‑symbol set and check for collisions in the database. Simpler but may require many retries under high load.
Algorithm analysis
The auto‑increment method is vulnerable to enumeration attacks because the conversion from decimal to base‑62 is reversible. The hash method offers better security by producing non‑sequential codes, while the random method has a higher collision probability.
The author prefers the hash‑based approach for its balance of simplicity and security.
Implementation
Storage design
Database schema stores the domain ( base_url ), path suffix ( suffix_url ), full URL ( full_url ), generated short code ( shot_code ), expiration date, and click count.
Expired data can be moved to a separate HBase table based on the expiration date, reducing the load on the primary table.
Cache strategy caches recently accessed or newly created URLs (e.g., last three months) using an LRU policy. Cache keys are the short codes, and a fallback to the database occurs on cache misses.
Redirection flow
DNS resolves the short‑URL domain to an IP address.
The client sends an HTTP GET request with the short code.
The server looks up the short code to obtain the long URL.
The server issues an HTTP 301 (permanent) redirect to the long URL.
A 301 redirect preserves SEO and analytics data, while a 302 would be temporary and less suitable for short‑URL services.
Code examples
Hash‑based short‑code generator (Java):
import org.apache.commons.lang3.StringUtils;
import javax.xml.bind.DatatypeConverter;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import java.util.concurrent.atomic.AtomicLong;
import static com.alibaba.fastjson.util.IOUtils.DIGITS;
/**
* @author rickiyang
* @date 2020-01-07
*/
public class ShortUrlGenerator {
public static void main(String[] args) {
String sLongUrl = "http://www.baidu.com/121244/ddd";
for (String shortUrl : shortUrl(sLongUrl)) {
System.out.println(shortUrl);
}
}
public static String[] shortUrl(String url) {
String key = "dwz";
String[] chars = new String[]{"a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z","0","1","2","3","4","5","6","7","8","9","A","B","C","D","E","F","G","H","I","J","K","L","M","N","O","P","Q","R","S","T","U","V","W","X","Y","Z"};
String sMD5EncryptResult = "";
try {
MessageDigest md = MessageDigest.getInstance("MD5");
md.update((key + url).getBytes());
byte[] digest = md.digest();
sMD5EncryptResult = DatatypeConverter.printHexBinary(digest).toUpperCase();
} catch (NoSuchAlgorithmException e) {
e.printStackTrace();
}
String[] resUrl = new String[4];
for (int i = 0; i < 4; i++) {
String sTempSubString = sMD5EncryptResult.substring(i * 8, i * 8 + 8);
long lHexLong = 0x3FFFFFFF & Long.parseLong(sTempSubString, 16);
String outChars = "";
for (int j = 0; j < 6; j++) {
long index = 0x0000003D & lHexLong;
outChars += chars[(int) index];
lHexLong = lHexLong >> 5;
}
resUrl[i] = outChars;
}
return resUrl;
}
}Base‑62 conversion utility (Java):
public class NumericConvertUtils {
private static final char[] digits = {'0','1','2','3','4','5','6','7','8','9','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z'};
public static String toOtherNumberSystem(long number, int seed) {
if (number < 0) {
number = ((long)2 * 0x7fffffff) + number + 2;
}
char[] buf = new char[32];
int charPos = 32;
while ((number / seed) > 0) {
buf[--charPos] = digits[(int) (number % seed)];
number /= seed;
}
buf[--charPos] = digits[(int) (number % seed)];
return new String(buf, charPos, (32 - charPos));
}
public static long toDecimalNumber(String number, int seed) {
char[] charBuf = number.toCharArray();
if (seed == 10) {
return Long.parseLong(number);
}
long result = 0, base = 1;
for (int i = charBuf.length - 1; i >= 0; i--) {
int index = 0;
for (int j = 0; j < digits.length; j++) {
if (digits[j] == charBuf[i]) {
index = j;
}
}
result += index * base;
base *= seed;
}
return result;
}
}The article concludes with recommendations on sharding tables for massive data volumes and notes that a 6‑character short code can support up to 100 billion entries, making it suitable for large‑scale URL shortening services.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.