Cloud Native 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production environment that highlight fragmented I/O patterns and the financial impact of storage API calls.

DataFunSummit
DataFunSummit
DataFunSummit
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper explores the growing industry trend of migrating data‑intensive analytics applications to cloud‑native environments, emphasizing that the unique cost model of cloud storage requires a more nuanced understanding of performance optimization.

Through empirical analysis of Uber’s production Presto workload, the study reveals that traditional I/O optimizations often overlook the financial cost of storage API calls, which can lead to unexpectedly high expenses in cloud settings.

Observations show that over 50% of data accesses are smaller than 10 KB and more than 90% are under 1 MB, indicating a highly fragmented access pattern that has different implications for cloud versus on‑premise platforms.

The paper presents a case‑study‑driven approach, offering logical I/O‑optimization strategies tailored for cloud environments, helping readers design efficient I/O solutions that improve cost‑performance ratios for data‑intensive workloads.

Key Sections

Adjusting perception and strategy based on varying cloud storage scenarios and their impact on application design and performance.

Case study of Uber data illustrating additional costs that widely‑used I/O‑optimization techniques may incur during cloud migration.

Providing a fresh perspective on system design in the cloud‑computing domain to assist stakeholders in handling rapidly evolving data‑intensive applications.

Reader Benefits

The whitepaper serves as a starting point for further research, presenting I/O‑optimization logic and ideas that enable readers to craft efficient I/O strategies specifically for cloud‑native, data‑intensive applications.

Cloud NativeBig DataI/O optimizationprestodata-intensivecost model
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.