Databases 20 min read

Designing a One‑Stop IoT Storage Solution: Architecture, Cost Optimization, and Performance

The talk outlines IoT data classifications, requirements, and proposes a one‑stop storage product using multi‑model support, columnar formats, compute‑storage separation, tiered storage, and query optimization to achieve ten‑fold cost reduction and ten‑fold performance gains.

DataFunSummit

May 19, 2022

Designing a One‑Stop IoT Storage Solution: Architecture, Cost Optimization, and Performance

The presentation, delivered by Wang Huaiyuan, an Alibaba Cloud Table Store architect, introduces the challenges of storing massive IoT data and sets three guiding questions about data classification, product design, and achieving a ten‑fold cost‑performance improvement.

IoT data is divided into three categories: device metadata (identifiers, attributes, status), message data (up‑ and down‑stream events and commands), and time‑series data (sensor readings, trajectories). Each category has distinct storage and query requirements, such as multi‑dimensional search for metadata, ordered queues for messages, and high‑compression columnar storage for time‑series.

The storage system must provide high reliability, availability, low cost, high performance, and easy scalability, while supporting flexible queries, geographic searches, and real‑time analytics.

A one‑stop IoT storage product is proposed, offering multiple data models, standard SQL, API/SDK interfaces, and data subscription capabilities. The engine is optimized for large‑scale, low‑cost storage and analytical workloads.

Storage format optimization is demonstrated by comparing Avro (row‑oriented) and Parquet (column‑oriented). Parquet reduces a 975 MB dataset to 46 MB (≈21× compression), highlighting the benefits of columnar storage for massive time‑series data.

The architecture choice favors storage‑compute separation (cloud‑native) over tightly coupled designs, enabling shared storage, elastic compute, lower unit costs, and higher reliability.

Tiered storage is introduced: hot data uses mixed row‑column storage for low‑latency writes and reads; warm data uses pure columnar storage for efficient scans; cold data employs highly compressed columnar formats on cheaper media. This strategy can further cut storage costs by another third.

Query performance is enhanced through horizontal and vertical sharding, distributed SQL execution, and distributed inverted indexes for metadata. Mixed row‑column storage supports high‑concurrency point queries, while pure columnar storage accelerates analytical scans.

The overall architecture combines multi‑model access, compute‑storage separation, tiered storage, and advanced indexing, delivering a unified solution that targets a ten‑fold reduction in storage cost and a ten‑fold boost in analytical performance compared to traditional stacks.

In conclusion, the design balances cost, performance, and functionality, and serves as a continuous optimization roadmap for future IoT data growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Native Database Design IoT Storage Architecture Columnar Storage

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.