How Sonic‑CPP Boosts JSON Parsing Speed 2.5× Faster Than RapidJSON
Sonic‑CPP, an open‑source C++ JSON library co‑developed by ByteDance’s STE and Service Framework teams, leverages SIMD vectorization, optimized memory layout, on‑demand parsing, and a compact DOM design to achieve up to 2.5× faster parsing than RapidJSON and competitive serialization performance, with extensive benchmark results and production‑grade usage.
Introduction
Sonic‑CPP is a high‑performance JSON library for C++ jointly developed by ByteDance’s STE team and the Service Framework team. It exploits current CPU features and SIMD vectorization to deliver serialization and deserialization speeds up to 2.5 times faster than RapidJSON, saving hundreds of thousands of CPU cores in ByteDance’s core services such as Douyin and Toutiao.
Why Build a Custom JSON Library?
ByteDance’s massive services consume a large share of CPU resources for JSON processing, sometimes exceeding 40% of a server’s CPU usage. Existing libraries like RapidJSON, while improved, still lag behind newer solutions such as yyjson and simdjson in raw parsing speed, and each of those alternatives has drawbacks (e.g., simdjson cannot modify parsed data, yyjson’s linked‑list structure hurts lookup performance). Sonic‑CPP was created to combine the strengths of these libraries while eliminating their weaknesses.
Key Features
Parsing speed up to 2.5× that of RapidJSON.
Supports efficient CRUD operations, addressing the shortcomings of yyjson and simdjson.
Provides a near‑complete JSON API for easy migration.
Proven at large scale in ByteDance’s advertising, search, and recommendation middle‑platform services.
Optimization Principles
Sonic‑CPP integrates the best ideas from RapidJSON, yyjson, and simdjson, then adds further optimizations based on SIMD instructions, memory layout tuning, and on‑demand parsing.
SIMD Vectorization
By leveraging SIMD (e.g., SSE, AVX2, NEON) the library processes multiple characters or numbers in parallel. For serialization, five SIMD instructions handle 32 characters at once, dramatically reducing the cost of escaping strings. For deserialization, SIMD is used to locate decimal points and end‑of‑number markers, then convert character vectors to numeric values with vector subtraction and fused multiply‑add operations.
Serialization Optimization
The library reads 32‑byte blocks with a single SIMD load, creates a mask for quote characters, reduces the mask to a general‑purpose register, and counts trailing zeros to locate escape positions. When AVX‑512 load‑mask instructions are unavailable, Sonic‑CPP safely handles potential page‑crossing reads by checking page boundaries and falling back to conservative handling.
Deserialization Optimization
Parsing numbers uses SIMD to locate decimal points and end markers, then subtracts the character '0' from each byte to obtain numeric values, followed by vectorized multiplication‑addition to assemble the final floating‑point number. Benchmarks show significant speed gains, especially for longer numeric strings.
On‑Demand Parsing
Sonic‑CPP provides a high‑performance on‑demand parsing API that extracts only the fields specified by a JsonPointer. It uses SIMD to scan 64‑byte windows, builds a bitmap of opening and closing braces, and determines object boundaries without fully parsing the entire JSON, outperforming traditional recursive‑descent and two‑stage approaches.
DOM Design Optimizations
Each JSON value is represented by a 16‑byte node, with type and size packed into a single 8‑byte field, reducing memory overhead. Objects store a map of keys to indices using string_view keys to avoid copying. The map is created lazily (on‑demand) and allocated from a memory pool derived from RapidJSON, eliminating frequent malloc/free calls.
Performance Evaluation
Benchmarks based on the nativejson‑benchmark suite show that Sonic‑CPP matches or exceeds the performance of simdjson and yyjson across parsing, serialization, and various real‑world scenarios. In production, a high‑traffic Douyin service observed a noticeable reduction in CPU usage after switching to Sonic‑CPP.
Future Outlook
Currently, Sonic‑CPP supports only the amd64 architecture; ARM support is planned. Future work includes full RFC compliance (e.g., JSON Merge Patch RFC 7386), JSON Path support, and further performance refinements.
ByteDance SYS Tech
Focused on system technology, sharing cutting‑edge developments, innovation and practice, and analysis of industry tech hotspots.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.