How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets
Facing a challenge of managing roughly one million server-side images and 180 client images, the TOOSIMPLE team built a high-performance backend using fingerprinting, parallel processing, mmap-SSE2 acceleration, and sparsemap indexing, achieving sub-second response times while ensuring correct ordered display.
Problem Statement
Given about 1,000,000 images stored on the server (1024 directories, each 1024 images, ~20KB each) from the Places365 dataset, and 180 client images with unknown categories, we need a server program to manage the images (using fingerprints) and a Chrome client that sends the 180 images' info, receives corresponding data, and displays them in 20 grids ordered by file number.
Evaluation Criteria
Performance: Timing starts when the server program launches (timestamp passed via start.sh) and ends when the browser finishes loading and correctly ordering all 180 images. The browser must show start timestamp, end timestamp, and elapsed time.
Correctness: The returned images must be correct and displayed in the proper order; incorrect results are excluded from performance ranking.
Team Introduction
Team TOOSIMPLE – members: Liu (frontend), Huang (frontend), Chen (backend), Li (backend).
Optimization Points
1. Frontend pre‑computes data
Hash, size, and byte slice calculations for the 180 images are performed before sending the POST request, allowing the server to receive data immediately after start.
Time saved: 100 ms ~ 1000 ms
2. Backend receives POST and scans files in parallel
The backend processes incoming POST data, scans filenames, and loads index files concurrently, reducing network‑transfer overhead.
Time saved: 70 ms
3. Ordered filtering
Multi‑threaded reading of the 1 M images shows that reading filenames (120 ms) is much faster than reading file sizes (1.8 s) or pages. Small‑API calls are used first to filter unrelated images before reading additional bytes for hash calculation.
Time saved: >1500 ms
4. Segment search with early termination
After shuffling the filename array, ten threads scan segments; once all 180 filenames are found, remaining threads stop early.
Time saved: 0 ~ 500 ms
5. Exploiting Places365 dataset features
The dataset provides 365 categories, up to 5000 files per category, and specific byte patterns that can filter 99.5 % of files, enabling efficient dictionary‑tree searches.
Time saved: >1000 ms
6. Hash algorithm ladder
Performance, fairness, and anonymity are considered; xxHash (especially xxh3) offers up to 50× MD5 speed, though JavaScript only supports xxh64.
7. Selecting hash bytes using file size
Two 100‑byte segments (based on file size) are hashed, providing uniqueness for 1.8 M images while remaining fast. (Illustrated in the following diagram.)
8. Using mmap + SSE2
Memory‑mapped file access combined with SSE2 SIMD instructions accelerates memcpy and reduces cache pollution.
Time saved: ~100 ms
9. Offline hash‑to‑class index storage
Using a sparsemap structure to serialize a uint64→int16 index reduces storage from 33 MB (sqlite3) to 18 MB and speeds up loading (≈200‑300 ms).
Q&A
Q1: Does the 8‑byte filter at offset 623 really help?
A1: It skips one 100‑byte read and one hash calculation for >95 % of files, giving noticeable speedup; lack of improvement may be due to uncleared caches.
Q2: How is the hash‑to‑class mapping stored?
A2: As a SparseMap of uint64→int16 serialized to disk (≈18 MB); embedding it in code could further improve performance.
Q3: Why choose XXH64 over the faster XXH3?
A3: XXH64 has a JavaScript implementation, while XXH3 lacks a pure JS version.
Q4: Why use SSE2 instead of AVX‑512?
A4: The required 100‑byte copy is well‑served by SSE2, which also reduces cache pollution.
Conclusion
The competition demonstrates that even a seemingly simple system offers many optimization opportunities; focusing on robustness, correctness, and extreme performance embodies the hackathon spirit and should guide future work.
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Official Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.