Big Data 7 min read

WeChat Pay Log System at Scale: Practices with Hermes

WeChat Pay’s Hermes‑based log system ingests trillions of entries daily, storing petabytes across a 200‑node HDFS cluster with four‑nine availability, while LSM‑style writes, separate inverted indexes and hot‑cold tiering cut memory, disk and cost by up to 70 % and keep 95 % of queries under five seconds.

Tencent Cloud Developer
Tencent Cloud Developer
Tencent Cloud Developer
WeChat Pay Log System at Scale: Practices with Hermes

WeChat Pay's log system leverages the Hermes platform to provide full‑text search capabilities. Since its adoption, the volume of logs has grown dramatically, now reaching a daily ingest of trillions of entries and a storage size measured in petabytes.

The system operates on a Hermes cluster of more than 200 nodes, handling around 6,000 concurrent query requests per day. The service level agreement (SLA) achieves four‑nines availability, with 95% of queries completing in under five seconds.

Hermes separates storage and computation by using HDFS as the underlying storage layer. HDFS supplies multi‑replica disaster recovery, automatic disk‑fault tolerance, and hot‑cold tiering. Although EC (Erasure Coding) support exists in newer HDFS versions, it has not yet been deployed in production.

This storage‑compute separation simplifies the upper‑level computation design. Index calculations are performed on a single replica, dramatically reducing CPU and memory consumption and boosting write QPS.

Hermes writes data in an LSM‑like fashion: records are first written to memory and a write‑ahead log (WAL), then flushed in batches to HDFS. Small index files generated by continuous writes are merged asynchronously, while larger merges are scheduled during low‑traffic windows (typically 02:00‑06:00) to maintain high query efficiency.

To improve performance further, Hermes separates index data from the actual log rows. The inverted index stores only term postings, while the row data stores the full log lines. Retrieval uses offset and RowId pointers, which reduces the number of index files by 68%, memory usage by 70%, disk usage by 14%, and improves query speed by about 80%.

Given that 90% of the logs are small, tail‑heavy modules, the system adopts a hot‑cold storage tiering strategy. Frequently accessed hot data is placed on high‑performance SSDs, while colder data resides on standard HDFS storage. Replica placement policies enable flexible assignment of storage types without affecting the application layer.

For historical data, Hermes defaults to two replicas but performs a routine replica downgrade for logs older than three days. This operation, transparent to both the computation layer and end users, reduces overall storage costs by more than 70%.

Hermes also provides an asynchronous batch export feature. Users can submit export requests for logs matching specific keywords and time ranges; the system exports the matching logs to TDW/HDFS, where they can be retrieved via standard clients or Hermes APIs.

In conclusion, the Hermes architecture empowers WeChat Pay to handle petabyte‑scale log volumes with high availability, efficient resource utilization, and cost‑effective storage management.

big dataindexingstorage architectureHDFSHermesLog AnalyticsWeChat Pay
Tencent Cloud Developer
Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.