Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: A Case Study of Uber Presto
This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzing how cloud storage cost models affect performance optimization, and presents a case study of Uber’s Presto production environment that reveals fragmented I/O patterns and the financial impact of storage API calls.
The paper explores the widespread industry shift of migrating data‑intensive analytical applications to cloud‑native environments, highlighting that the unique cost model of cloud storage demands a more granular understanding of performance optimization.
Using observations from Uber’s production Presto deployment, the authors show that data access patterns are highly fragmented—over 50% of accesses are smaller than 10 KB and more than 90% are under 1 MB—behaviors that have very different cost implications in the cloud compared with traditional on‑premise platforms.
Through an empirical case study, the paper demonstrates that conventional I/O optimizations that ignore storage‑API call costs can lead to unexpectedly high expenses in cloud settings. It then proposes a set of logical I/O‑optimization strategies tailored for cloud environments, aiming to improve cost‑performance ratios for data‑intensive workloads.
The core sections cover (1) adjusting perception and strategy based on varying cloud storage scenarios and their impact on application design and performance; (2) a detailed case study of Uber data illustrating additional costs introduced by cloud migration; and (3) new perspectives on system design in the cloud computing domain to help stakeholders cope with the rapid growth of data‑intensive applications.
Readers will gain a concrete, case‑based framework for designing efficient I/O strategies specifically for cloud‑native, data‑heavy applications, providing a solid starting point for further research and implementation.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.