Optimizing I/O for Data-Intensive Applications in Cloud-Native Environments: Insights from Uber Presto
This whitepaper examines the industry shift of moving data‑intensive analytics to cloud‑native platforms, revealing how cloud storage’s unique cost model demands nuanced I/O optimization, and presents Uber Presto case‑study findings that highlight fragmented access patterns and cost‑effective design strategies for high‑performance cloud workloads.
This whitepaper explores the growing industry trend of migrating data‑intensive analytics workloads from on‑premises to cloud‑native environments, emphasizing that the distinct cost model of cloud storage requires a deeper understanding of performance optimization.
Through an empirical study of Uber’s production Presto cluster, the authors reveal that more than 50 % of data accesses are smaller than 10 KB and over 90 % are under 1 MB, indicating a highly fragmented access pattern that incurs significant storage API call costs in the cloud.
Based on these observations, the paper proposes a set of I/O‑optimization principles and strategies—such as adjusting data layout, leveraging cloud‑native storage features, and aligning application design with storage pricing—to achieve cost‑effective, high‑performance processing of data‑intensive applications.
Readers will gain a concrete case‑study framework for designing efficient I/O solutions tailored to cloud environments, providing a foundation for further research and practical implementation.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.