Presto Implementation and Practice at YouZan: A Big Data Query Engine Journey
The article outlines Presto’s high‑performance, coordinator‑worker architecture and query flow, describes YouZan’s migration from mixed Hadoop deployment to dedicated low‑latency clusters, details challenges such as small‑file handling and regex backtracking with their fixes, and previews future enhancements like Alluxio integration, session property managers, and Ranger‑based multi‑tenant isolation.
This article introduces Presto, an open-source high-performance distributed SQL query engine developed by Facebook, and its practical implementation at YouZan.
1. Presto Overview
Presto was developed to address Hive's limitations in interactive query scenarios. While Hive is designed for batch processing with high latency, Presto is specifically optimized for interactive queries, providing minute-level or even sub-second low-latency query performance.
1.1 Presto Architecture
Presto uses a Coordinator-Worker architecture with a Coordinator responsible for query parsing, planning, and scheduling, while Workers execute the actual query tasks.
1.2 Query Execution Process
Client sends request to Coordinator
SQL is parsed via ANTLR to generate AST
AST undergoes semantic analysis using metadata
Logical execution plan is generated and optimized through rules
Logical plan is split into different Stages, and Worker nodes are scheduled to generate Tasks
Tasks generate corresponding physical execution plans
Coordinator串联Stages based on scheduling results
Workers execute the physical execution plans
Client continuously pulls query results from Coordinator, which pulls from the final aggregating Worker node
1.3 Why Presto is High-Performance
Pipeline, full in-memory computation
SQL query plan rule optimization
Dynamic code generation technology
Data scheduling localization, focusing on memory overhead efficiency, optimizing data structures, caching, and approximate query techniques
2. Presto Usage Scenarios at YouZan
Data Platform (DP) temporary queries: Exploratory data analysis with desensitization and audit features
BI reporting engine: Analytical reports for merchants
Metadata data quality validation
Data products: CRM data analysis, crowd profiling
3. Presto Evolution at YouZan
Phase 1: Mixed Deployment with Hadoop
Initially, Presto was deployed together with the Hadoop offline cluster. However, users complained about unstable performance due to disk I/O bandwidth being saturated by Hadoop offline tasks.
Phase 2: Independent Presto Cluster
YouZan created a separate Presto cluster with dedicated HDFS environment. They used a shared Hive metadata store, creating a new database that points to the Presto cluster's HDFS NameService.
Phase 3: Low-Latency Dedicated Presto Cluster
For businesses requiring very low response times (under 3 seconds, sometimes even 1 second), dedicated clusters were deployed with full resource isolation and local HDFS. Configuration tuning like setting Task Concurrency to 1 improved performance in high-concurrency scenarios.
4. Problems Encountered and Solutions
4.1 HDFS Small File Problem
Small files caused slow queries due to Presto's split limits:
node-scheduler.max-splits-per-node=100
node-scheduler.max-pending-splits-per-task=10Solution: Increased these parameters and introduced Adaptive Spark and small file merging tools in the ETL layer.
4.2 Regex Exponential Backtracking
A user's regex caused exponential backtracking, running for over 1 hour. Solution: Configured Presto to use Google RE2J and added query maximum runtime limits.
4.3 Multiple Column DISTINCT Problem
Queries with multiple count distinct columns were slow. While Spark, Hive TEZ, and Calcite optimize this using grouping sets, Presto had not implemented this optimization. YouZan submitted issues and PRs to the community to address this.
4.4 HDFS Namenode Causing Occasional Slow Queries
During NameNode Edit Log Rolling, read requests were blocked by write locks, causing 1-second delays. Potential solutions include using Uber's Observer NameNode, Alluxio, or SSD storage for NameNode disks.
5. Future Outlook
5.1 Presto + Alluxio
Alluxio provides fine-grained memory control, making IO times fast and consistent. While single-task sequential disk read can reach 150MB/s (making CPU the bottleneck), multiple parallel tasks can create disk I/O bottlenecks, where Alluxio helps stabilize performance.
5.2 Presto Session Property Managers
New Presto versions implement session property managers for different workloads, allowing optimal configurations for different business SQL types and data volumes.
5.3 Presto Multi-Tenant Isolation
YouZan plans to integrate with Apache Ranger for multi-tenant data isolation through SQL rewriting.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.