Designing a One‑Stop IoT Storage Solution: Architecture, Cost Optimization, and Performance
The talk outlines IoT data classifications, requirements, and proposes a one‑stop storage product using multi‑model support, columnar formats, compute‑storage separation, tiered storage, and query optimization to achieve ten‑fold cost reduction and ten‑fold performance gains.
The presentation, delivered by Wang Huaiyuan, an Alibaba Cloud Table Store architect, introduces the challenges of storing massive IoT data and sets three guiding questions about data classification, product design, and achieving a ten‑fold cost‑performance improvement.
IoT data is divided into three categories: device metadata (identifiers, attributes, status), message data (up‑ and down‑stream events and commands), and time‑series data (sensor readings, trajectories). Each category has distinct storage and query requirements, such as multi‑dimensional search for metadata, ordered queues for messages, and high‑compression columnar storage for time‑series.
The storage system must provide high reliability, availability, low cost, high performance, and easy scalability, while supporting flexible queries, geographic searches, and real‑time analytics.
A one‑stop IoT storage product is proposed, offering multiple data models, standard SQL, API/SDK interfaces, and data subscription capabilities. The engine is optimized for large‑scale, low‑cost storage and analytical workloads.
Storage format optimization is demonstrated by comparing Avro (row‑oriented) and Parquet (column‑oriented). Parquet reduces a 975 MB dataset to 46 MB (≈21× compression), highlighting the benefits of columnar storage for massive time‑series data.
The architecture choice favors storage‑compute separation (cloud‑native) over tightly coupled designs, enabling shared storage, elastic compute, lower unit costs, and higher reliability.
Tiered storage is introduced: hot data uses mixed row‑column storage for low‑latency writes and reads; warm data uses pure columnar storage for efficient scans; cold data employs highly compressed columnar formats on cheaper media. This strategy can further cut storage costs by another third.
Query performance is enhanced through horizontal and vertical sharding, distributed SQL execution, and distributed inverted indexes for metadata. Mixed row‑column storage supports high‑concurrency point queries, while pure columnar storage accelerates analytical scans.
The overall architecture combines multi‑model access, compute‑storage separation, tiered storage, and advanced indexing, delivering a unified solution that targets a ten‑fold reduction in storage cost and a ten‑fold boost in analytical performance compared to traditional stacks.
In conclusion, the design balances cost, performance, and functionality, and serves as a continuous optimization roadmap for future IoT data growth.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.