Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production environment that highlight fragmented I/O patterns and the financial impact of storage API calls.
This whitepaper explores the growing industry trend of migrating data‑intensive analytics applications to cloud‑native environments, emphasizing that the unique cost model of cloud storage requires a more nuanced understanding of performance optimization.
Through empirical analysis of Uber’s production Presto workload, the study reveals that traditional I/O optimizations often overlook the financial cost of storage API calls, which can lead to unexpectedly high expenses in cloud settings.
Observations show that over 50% of data accesses are smaller than 10 KB and more than 90% are under 1 MB, indicating a highly fragmented access pattern that has different implications for cloud versus on‑premise platforms.
The paper presents a case‑study‑driven approach, offering logical I/O‑optimization strategies tailored for cloud environments, helping readers design efficient I/O solutions that improve cost‑performance ratios for data‑intensive workloads.
Key Sections
Adjusting perception and strategy based on varying cloud storage scenarios and their impact on application design and performance.
Case study of Uber data illustrating additional costs that widely‑used I/O‑optimization techniques may incur during cloud migration.
Providing a fresh perspective on system design in the cloud‑computing domain to assist stakeholders in handling rapidly evolving data‑intensive applications.
Reader Benefits
The whitepaper serves as a starting point for further research, presenting I/O‑optimization logic and ideas that enable readers to craft efficient I/O strategies specifically for cloud‑native, data‑intensive applications.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.