Big Data 11 min read

Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases

This article introduces Volcano EMR's cloud‑native OLAP solution, detailing its product overview, storage‑compute separation, elastic scaling, cost and hot‑cold data management, intelligent query analysis, multiple customer case studies, and future roadmap for real‑time and offline data warehousing.

DataFunSummit
DataFunSummit
DataFunSummit
Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases

EMR Product Overview

Volcano EMR is a standard big‑data product that has evolved to support OLAP scenarios with features such as storage‑compute separation, hot‑cold data tiering, and on‑demand elasticity, built on Volcano's underlying infrastructure like object storage and ECS.

EMR serves four main scenarios: IDC migration to cloud, data lake with elastic storage‑compute separation, real‑time data warehouse (leveraging StarRocks and Doris), and open‑source migration.

EMR OLAP Cloud‑Native

The cloud‑native OLAP capability offers two deployment modes: semi‑managed (self‑operated via Volcano's platform) and fully managed (high SRA, reduced operational burden). It provides extreme elasticity through the EMR Stateless concept, enabling cluster‑level scaling and state persistence, as well as cost management via mixed deployments of OLAP and Hadoop/Spark clusters.

Hot‑cold data management is achieved through object storage integration, allowing automatic data tiering and migration. Intelligent query analysis is performed by diagnosing large queries, identifying bottleneck operators, and optimizing execution plans.

Customer Case Studies

1. Real‑time advertising client: migrated from Greenplum to a Doris + Elasticsearch solution, optimizing ES connector catalog creation, push‑down filters, and resource isolation, resulting in halved cluster size and improved performance.

2. Offline tourism industry client: replaced Hadoop + Presto + Kylin with Hadoop + StarRocks, achieving significant storage cost reduction, better performance, and simplified architecture.

Additional improvements include task node scaling, resource isolation, and custom data migration tools.

Future Planning

Planned enhancements focus on real‑time engine improvements (row‑column hybrid storage, WAL + MemTable), cloud‑native upgrades (CN node optimization, seamless storage‑compute interaction), and offline engine consolidation (single storage, unified metadata, lightweight ETL).

Today’s sharing concludes with thanks to the audience.

cloud nativeBig DataCost ManagementData WarehouseOLAPEMR
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.