Designing a 100W QPS Short URL System: Architecture, ID Generation, and High‑Concurrency Strategies
This article explains how to design a high‑performance short‑URL service capable of handling 1 million queries per second, covering system background, URL shortening principles, base‑62 encoding, distributed ID generation options, storage sharding, ambiguity checking with Bloom filters, multi‑level caching, and redirect handling.
The article starts by describing the motivation for short URLs: reducing link length for better readability, saving screen space, enabling content control, facilitating traffic analysis, improving bandwidth utilization, and providing cleaner, safer links.
1. Short URL System Background
Short URLs replace long links in social platforms (e.g., url.cn) to fit character limits and improve user experience.
2. Short URL Principle
The core idea is to map a long URL to a short one. The client accesses the short URL, the Java service translates it to the original URL, returns a 302 redirect, and the browser follows the redirect.
Base‑62 Encoding
Using a 62‑character set (0‑9, a‑z, A‑Z) allows a 7‑character code to represent 62⁷ ≈ 3.5 trillion possibilities, far exceeding the capacity of decimal or hexadecimal encodings.
62^7 = 3,521,614,606,208 (≈3.5 trillion)
10^6‑1 = 999,999
16^6‑1 = 16,777,215
62^6‑1 = 56,800,235,583Longer codes (e.g., 8 characters) provide even larger address spaces.
3. Functional Decomposition
The system consists of three main functions: issuing IDs, storing mappings, and performing look‑ups. These are split into an issuing‑and‑storage module and a mapping module.
Issuing & Storage Module
Issue: generate a unique numeric ID for each long URL, ensuring idempotency (same long URL always gets the same short URL).
Store: persist the ID‑URL pair in a database and convert the numeric ID to a base‑62 string.
Mapping Module
Convert the base‑62 short code back to a decimal ID.
Lookup the original URL in the database.
Return a 302 redirect to the client.
4. High‑Concurrency ID Generation
Several ID generation schemes are evaluated:
Option 1 – Hash of the URL
Using MD5, CRC, or MurmurHash to produce an integer ID, but collisions make this unsuitable.
Fixed short‑link domain + hash value = www.weibo.com/888888888Option 2 – Auto‑increment DB ID
Simple but suffers from bottlenecks under high load and ID conflicts in sharded environments.
Option 3 – Distributed Middleware (Redis, MongoDB)
Provides higher throughput than a single DB but still limited by remote calls.
Option 4 – UUID/GUID
Globally unique without a data source, but large size and lack of ordering hurt index performance.
Option 5 – Snowflake Algorithm
Generates 64‑bit IDs composed of timestamp, datacenter ID, machine ID, and sequence; offers high throughput (>1 M ops/s) and ordered IDs, with the main drawback being clock‑backward handling.
The final choice is the Snowflake algorithm for its performance and ordering characteristics.
5. Data Storage Architecture
A simple MySQL table stores two columns: ID (bigint, Snowflake ID) and SURL (original URL). For massive scale, sharding and partitioning are applied.
1. ID int // distributed Snowflake ID
2. SURL varchar // original URL6. Ambiguity Check (Idempotent Mapping)
To avoid generating different short URLs for the same long URL, a two‑step check is used:
First, query a Redis Bloom filter; if the URL is absent, generate a new short URL.
If the Bloom filter reports presence (possible false positive), verify against the DB.
Bloom filters provide fast, memory‑efficient existence checks with a low false‑positive rate.
// Bloom filter check pseudocode
if (!bloomFilter.mightContain(longUrl)) {
generateShortUrl();
} else {
if (db.contains(longUrl)) {
return existingShortUrl();
} else {
generateShortUrl();
}
}7. Mapping Module High‑Concurrency Design
Multi‑level caching (local Caffeine, Redis) stores hot URL mappings to reduce DB load. Popular URLs are cached in Redis; recent URLs are kept in an in‑process cache with LRU eviction.
Redirect Types
301 Permanent redirect – reduces server load but loses server‑side analytics.
302 Temporary redirect – enables accurate access counting at the cost of extra traffic.
8. Architectural Takeaway
There is no single “best” solution; the design must balance performance, consistency, and operational complexity. The article encourages discussion and continuous improvement.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.