Big Data 15 min read

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

This article presents the practical implementation of Presto combined with Tencent DOP (Alluxio) in a financial OLAP scenario, detailing background and architectural evolution, the Presto‑Alluxio design, optimization techniques for caching, storage scalability, ORC handling, and performance results, followed by conclusions and future directions.

DataFunTalk

Sep 9, 2023

Presto + Tencent DOP (Alluxio) Architecture and Optimization Practices for Financial OLAP

As enterprise data volumes grow, balancing low‑cost storage with high‑performance query becomes a primary demand; the article introduces the Presto + Tencent DOP (Alluxio) solution deployed in Tencent's financial analytics to reduce cost and improve efficiency.

The architecture treats Alluxio as an SSD‑based cache layer for HDFS, deployed remotely rather than co‑located, allowing Presto to offload I/O, leverage Alluxio's LRU policy, and achieve high query concurrency through Presto's split scheduling and SuperSQL's Calcite‑based translation to Spark.

Key challenges addressed include ensuring cache stability when large queries trigger massive block evictions, and extending Alluxio’s storage scalability across heterogeneous worker capacities; solutions involve whitelist‑based access control, time‑range constraints, and a value‑score model to select tables for caching.

To improve storage allocation, a capacity‑aware random policy (CapacityBaseRandomPolicy) and its deterministic variant were contributed to the Alluxio community, balancing load according to worker disk size and reducing eviction rates.

Performance tests in idle and busy periods showed Alluxio‑accelerated queries achieving up to 68% latency reduction, with 98% cache hit rate achieved by daily computation of optimal table ranges and dynamic whitelist updates.

Further optimizations covered ORC stripe and row‑count tuning to avoid over‑merged reads, and separating metadata (inode vs block) in the Alluxio master to move block location data to memory, boosting QPS from 25k to 65k.

The work demonstrates successful cross‑team collaboration, delivering a robust Alluxio‑based OLAP platform, and outlines future directions such as CPU‑focused improvements, Velox integration, and extending the solution to additional business scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data OLAP Tencent Presto Alluxio

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.