Optimizing I/O for Data-Intensive Analytics in Cloud-Native Environments: Insights from Uber Presto
This whitepaper examines the industry trend of migrating data‑intensive analytics workloads to cloud‑native environments, revealing how cloud storage’s unique cost model demands finer‑grained performance optimization, and presents Uber Presto case‑study findings that expose fragmented I/O patterns and associated financial impacts.
The paper investigates the growing industry trend of moving data‑intensive analytics applications from on‑premises to cloud‑native environments, emphasizing that cloud storage introduces a distinct cost model that requires more detailed performance‑optimization strategies.
Through an empirical study of Uber’s production Presto workload, the authors observed highly fragmented data‑access patterns—over 50 % of reads are smaller than 10 KB and more than 90 % are under 1 MB—highlighting that traditional I/O optimizations that ignore storage‑API call costs can lead to excessive expenses in the cloud.
The whitepaper presents a case‑study‑driven set of I/O‑optimization logic and tactics, illustrating how to redesign I/O for cloud environments to improve cost‑effectiveness and performance.
Readers are offered a new perspective on system design for cloud‑computing platforms, enabling them to devise efficient I/O strategies tailored to data‑intensive applications in cloud‑native settings.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.