Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP
This article presents the practical implementation of Presto combined with Tencent DOP (Alluxio) in a financial OLAP scenario, detailing background and architectural evolution, the Presto‑Alluxio design, optimization techniques for caching, storage scalability, ORC handling, and performance results, followed by conclusions and future directions.
As enterprise data volumes grow, balancing low‑cost storage with high‑performance query becomes a primary demand; the article introduces the Presto + Tencent DOP (Alluxio) solution deployed in Tencent's financial analytics to reduce cost and improve efficiency.
The architecture treats Alluxio as an SSD‑based cache layer for HDFS, deployed remotely rather than co‑located, allowing Presto to offload I/O, leverage Alluxio's LRU policy, and achieve high query concurrency through Presto's split scheduling and SuperSQL's Calcite‑based translation to Spark.
Key challenges addressed include ensuring cache stability when large queries trigger massive block evictions, and extending Alluxio’s storage scalability across heterogeneous worker capacities; solutions involve whitelist‑based access control, time‑range constraints, and a value‑score model to select tables for caching.
To improve storage allocation, a capacity‑aware random policy (CapacityBaseRandomPolicy) and its deterministic variant were contributed to the Alluxio community, balancing load according to worker disk size and reducing eviction rates.
Performance tests in idle and busy periods showed Alluxio‑accelerated queries achieving up to 68% latency reduction, with 98% cache hit rate achieved by daily computation of optimal table ranges and dynamic whitelist updates.
Further optimizations covered ORC stripe and row‑count tuning to avoid over‑merged reads, and separating metadata (inode vs block) in the Alluxio master to move block location data to memory, boosting QPS from 25k to 65k.
The work demonstrates successful cross‑team collaboration, delivering a robust Alluxio‑based OLAP platform, and outlines future directions such as CPU‑focused improvements, Velox integration, and extending the solution to additional business scenarios.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.