Optimizing I/O for Data‑Intensive Analytics in Cloud‑Native Environments: Insights from Uber Presto
This whitepaper examines the industry trend of moving data‑intensive analytics applications to cloud‑native environments, revealing how cloud storage cost models affect performance optimization, and presents case‑study findings from Uber’s Presto production workload that highlight fragmented I/O patterns and the financial impact of storage API calls.
This whitepaper explores the growing industry shift of migrating data‑intensive analytics workloads to cloud‑native platforms. It highlights that cloud storage introduces a distinct cost model, requiring a deeper understanding of performance optimization that accounts for storage API expenses.
The study analyzes Uber’s production use of Presto, showing that more than 50% of data accesses are smaller than 10 KB and over 90% are under 1 MB, indicating a highly fragmented access pattern. Such fragmentation has different cost implications in cloud environments compared to traditional on‑premises systems.
Based on this empirical research, the paper presents logical I/O‑optimization strategies tailored for cloud deployments, aiming to help architects design efficient I/O solutions that improve cost‑performance ratios for data‑intensive applications.
Key sections cover adjusting cognition and strategies according to varying cloud storage scenarios, case‑study evidence of additional costs incurred by widely used I/O‑optimization techniques during cloud migration, and a fresh perspective on system design in the cloud computing domain.
Readers will gain a foundational case‑study‑driven framework for crafting I/O strategies specifically optimized for cloud‑native, data‑heavy workloads.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.