Cloud Native 16 min read

AutoMQ: Cloud‑Native Kafka Leveraging Shared Storage for Cost and Performance Gains

The article explains how AutoMQ, a cloud‑native Kafka built on a shared‑storage architecture and Alibaba Cloud services such as OSS, ESSD, ESS and preemptible instances, achieves up to ten‑fold cost savings, high performance, elastic scaling, and robust disaster‑recovery capabilities.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
AutoMQ: Cloud‑Native Kafka Leveraging Shared Storage for Cost and Performance Gains

AutoMQ is a next‑generation cloud‑native Kafka implemented on a shared‑storage architecture that, through deep integration with Alibaba Cloud services like Object Storage OSS, block storage ESSD, elastic scaling ESS, and preemptible instances, delivers roughly ten times the cost efficiency of Apache Kafka while providing automatic elasticity.

The author defines a truly cloud‑native product as one that fully exploits native cloud computing capabilities, elasticity, and scalability to achieve quantitative advantages in both cost and efficiency.

In March 2024, AutoMQ partnered with Alibaba Cloud to jointly release the product on the Alibaba Cloud Marketplace, and the article examines how AutoMQ utilizes Alibaba Cloud’s compute and storage services to solve real‑world user problems.

Cost optimization and performance via storage services : AutoMQ’s S3Stream library enables efficient streaming read/write directly on OSS, leveraging the low‑price, multi‑AZ redundancy of OSS (≈0.12 CNY/GiB·month) and the true compute‑storage separation of the shared‑storage model, which allows instant, lossless partition migration without data copying.

Additional OSS benefits include disaster recovery through snapshot‑based cluster reconstruction, cross‑region replication without custom networking, shared read‑only replicas for high‑fan‑out consumption, and “Zero ETL” by eliminating the need for separate extraction pipelines.

Block storage ESSD : Contrary to common misconceptions, ESSD is a distributed, multi‑replica storage system offering nine‑nines durability and shared‑storage semantics. Performance is boosted by offloading the client to the proprietary “ShenLong MOC” accelerator and using a custom RDMA‑based protocol, delivering stable IOPS and throughput.

AutoMQ’s three ESSD innovations are: (1) reliability separation that avoids application‑level replication mechanisms, (2) using ESSD as a remote, shared WAL that any node can assume during recovery, and (3) cost‑effective sizing—e.g., a 2 GiB ESSD PL0 volume for WAL costs about 1 CNY per month, with linear scaling by adding more small volumes.

By combining OSS (high‑throughput, low‑cost, unlimited capacity) with ESSD (low‑latency, high‑IOPS, durable WAL), S3Stream provides a unified streaming storage layer that is both fast and inexpensive.

Multi‑mount and NVMe PR protocol : ESSD’s multi‑mount capability and NVMe PR locking enable millisecond‑level failover and recovery without unmounting disks, allowing AutoMQ to quickly remount a failed broker’s storage to a healthy node.

Regional ESSD extends multi‑AZ redundancy by distributing replicas across zones, supporting cross‑zone shared mounts, IO fencing, and providing high availability at minimal cost.

Compute service benefits : Alibaba Cloud ECS offers a 99.975 % SLA, enabling AutoMQ to achieve three‑nines availability and 80 MiB/s write throughput on a modest 2C16G instance, while automatic failover leverages ECS’s rapid recovery capabilities.

Elastic scaling ESS replaces the need for Kubernetes by offering configuration management, auto‑scaling, scheduled scaling, multi‑AZ deployment, and health checks, effectively providing a lightweight IaaS‑level orchestration layer.

Preemptible instances deliver up to 90 % cost savings; AutoMQ’s stateless, compute‑storage‑separated design tolerates instance reclamation, using ESSD‑based WAL recovery to maintain continuity.

In conclusion, re‑architecting Kafka as a cloud‑native, shared‑storage system dramatically reduces costs, improves operational efficiency, and enhances scalability, making it a compelling choice for data‑driven enterprises.

Cloud NativeKafkaCost OptimizationAlibaba CloudShared StorageAutoMQ
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.