Backend Development 12 min read

How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets

Facing a challenge of managing roughly one million server-side images and 180 client images, the TOOSIMPLE team built a high-performance backend using fingerprinting, parallel processing, mmap-SSE2 acceleration, and sparsemap indexing, achieving sub-second response times while ensuring correct ordered display.

Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
Xingsheng Youxuan Technology Community
How We Processed 1 Million Images in Sub-Second: Backend Optimization Secrets

Problem Statement

Given about 1,000,000 images stored on the server (1024 directories, each 1024 images, ~20KB each) from the Places365 dataset, and 180 client images with unknown categories, we need a server program to manage the images (using fingerprints) and a Chrome client that sends the 180 images' info, receives corresponding data, and displays them in 20 grids ordered by file number.

Evaluation Criteria

Performance: Timing starts when the server program launches (timestamp passed via start.sh) and ends when the browser finishes loading and correctly ordering all 180 images. The browser must show start timestamp, end timestamp, and elapsed time.

Correctness: The returned images must be correct and displayed in the proper order; incorrect results are excluded from performance ranking.

Team Introduction

Team TOOSIMPLE – members: Liu (frontend), Huang (frontend), Chen (backend), Li (backend).

Optimization Points

1. Frontend pre‑computes data

Hash, size, and byte slice calculations for the 180 images are performed before sending the POST request, allowing the server to receive data immediately after start.

Time saved: 100 ms ~ 1000 ms

2. Backend receives POST and scans files in parallel

The backend processes incoming POST data, scans filenames, and loads index files concurrently, reducing network‑transfer overhead.

Time saved: 70 ms

3. Ordered filtering

Multi‑threaded reading of the 1 M images shows that reading filenames (120 ms) is much faster than reading file sizes (1.8 s) or pages. Small‑API calls are used first to filter unrelated images before reading additional bytes for hash calculation.

Time saved: >1500 ms

4. Segment search with early termination

After shuffling the filename array, ten threads scan segments; once all 180 filenames are found, remaining threads stop early.

Time saved: 0 ~ 500 ms

5. Exploiting Places365 dataset features

The dataset provides 365 categories, up to 5000 files per category, and specific byte patterns that can filter 99.5 % of files, enabling efficient dictionary‑tree searches.

Time saved: >1000 ms

6. Hash algorithm ladder

Performance, fairness, and anonymity are considered; xxHash (especially xxh3) offers up to 50× MD5 speed, though JavaScript only supports xxh64.

7. Selecting hash bytes using file size

Two 100‑byte segments (based on file size) are hashed, providing uniqueness for 1.8 M images while remaining fast. (Illustrated in the following diagram.)

8. Using mmap + SSE2

Memory‑mapped file access combined with SSE2 SIMD instructions accelerates memcpy and reduces cache pollution.

Time saved: ~100 ms

9. Offline hash‑to‑class index storage

Using a sparsemap structure to serialize a uint64→int16 index reduces storage from 33 MB (sqlite3) to 18 MB and speeds up loading (≈200‑300 ms).

Q&A

Q1: Does the 8‑byte filter at offset 623 really help?

A1: It skips one 100‑byte read and one hash calculation for >95 % of files, giving noticeable speedup; lack of improvement may be due to uncleared caches.

Q2: How is the hash‑to‑class mapping stored?

A2: As a SparseMap of uint64→int16 serialized to disk (≈18 MB); embedding it in code could further improve performance.

Q3: Why choose XXH64 over the faster XXH3?

A3: XXH64 has a JavaScript implementation, while XXH3 lacks a pure JS version.

Q4: Why use SSE2 instead of AVX‑512?

A4: The required 100‑byte copy is well‑served by SSE2, which also reduces cache pollution.

Conclusion

The competition demonstrates that even a seemingly simple system offers many optimization opportunities; focusing on robustness, correctness, and extreme performance embodies the hackathon spirit and should guide future work.

performance optimizationimage processingBackend DevelopmentgolangMMAPHashingsse2large-scale data
Xingsheng Youxuan Technology Community
Written by

Xingsheng Youxuan Technology Community

Xingsheng Youxuan Technology Official Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.