Cloud Native 3 min read

Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

This whitepaper examines the industry trend of moving data‑intensive analytics workloads to cloud‑native platforms, analyzes the unique cost model of cloud storage, and presents case‑study findings from Uber's Presto production environment to guide efficient I/O design and cost‑effective performance optimization.

DataFunSummit
DataFunSummit
DataFunSummit
Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto

The article explores the widespread industry trend of migrating data‑intensive analytical applications from on‑premises to cloud‑native environments, highlighting that the distinct cost model of cloud storage demands a more nuanced understanding of performance optimization.

Through an empirical study of Uber's Presto production workload, the paper reveals that traditional I/O optimizations often overlook the financial cost of storage API calls, which can lead to high expenses in cloud settings.

Observations show that Presto's data access patterns are highly fragmented—over 50% of accesses are smaller than 10 KB and over 90% are under 1 MB—indicating that cloud‑based environments interpret such patterns very differently from traditional data platforms.

The case study provides concrete I/O optimization logic and strategies, aiming to help readers design efficient I/O solutions tailored to cloud environments, thereby significantly improving cost‑performance ratios for data‑intensive processing.

Key sections cover adjusting cognition and strategies based on varying cloud storage scenarios, the additional costs associated with widely used I/O optimization techniques in enterprise cloud migrations, and a fresh perspective on system design in the cloud computing domain to address the rapid growth of data‑intensive applications.

The whitepaper serves as a starting point for further research, offering readers actionable insights to develop specialized I/O strategies for cloud‑native, data‑intensive workloads.

PerformanceCloud NativeI/O optimizationprestocost modeldata-intensive analytics
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.