Databases 12 min read

TDengine Architecture and Storage Design for IoT Big Data

This article explains TDengine’s architecture, including its management, data, and client modules, virtual node design, write process, and detailed storage file structures, highlighting how its innovative design optimizes resource usage and performance for IoT and other big‑data applications.

DataFunTalk

Jul 16, 2019

TDengine is an open‑source, lightweight, high‑performance time‑series database specifically designed for IoT, vehicular networks, industrial internet, and IT operations. It provides a fast time‑series engine, caching, data subscription, and stream processing to reduce development and operations effort.

TDengine Architecture

The system consists of three main modules: the Management Node (MGMT), the Data Node (DNODE), and the Client module. The Management Node handles metadata storage and queries, while the Data Node stores and queries actual time‑series data. Clients first contact the Management Node to obtain metadata before accessing Data Nodes.

Management Node Module

The Management Node stores user, database, and table metadata. Operations such as creating or dropping databases/tables are first processed by this node, which then coordinates with Data Nodes to allocate or release resources.

Data Node Module

Data Nodes use virtual nodes (vnodes) to virtualize physical resources. Each vnode acts as an independent storage unit with its own cache and disk directory, allowing fine‑grained resource allocation and horizontal scaling. Multiple vnodes can run on a single physical machine.

Client Module

The client parses incoming SQL statements, converts them into internal structures, and forwards them to the server modules.

Write Process

Data writes follow a pre‑write‑log (WAL) algorithm: the client’s data is first written to a WAL file, then cached in the target vnode. The server acknowledges success, and data is later persisted to disk either by time‑driven or data‑driven flushing.

Metadata Storage

Metadata files reside in /var/lib/taos/mgmt/ and are append‑only. Example directory structure:

<span>/var/lib/taos/</span>
<span>   +--mgmt/</span>
<span>       +--db.db</span>
<span>       +--meters.db</span>
<span>       +--user.db</span>
<span>       +--vgroups.db</span>

Data Storage Files

Data is stored under /var/lib/taos/data/ and vnode information under /var/lib/taos/tsdb/. The directory hierarchy includes vnode directories, meter object files, and linked data files. Example:

<span>/var/lib/taos/</span>
<span>   +--tsdb/</span>
<span>   |   +--vnode0</span>
<span>   |        +--meterObj.v0</span>
<span>   |        +--db/</span>
<span>   |            +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1</span>
<span>   |            +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data</span>
<span>   |            +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1</span>
<span>   +--data/</span>
<span>       +--vnode0/</span>
<span>           +--v0f1804.head1</span>
<span>           +--v0f1804.data</span>
<span>           +--v0f1804.last1</span>

Each vnode has a meterObj file describing vnode basics and table information. The head file stores indexes to data blocks, enabling fast reads by loading all indexes of a table into memory.

<span><File Start></span>
<span> [File Header]</span>
<span> [Table1 Offset & Length]</span>
<span> [Table2 Offset & Length]</span>
<span> …</span>
<span> [Table1 Index]</span>
<span> [Table2 Index]</span>
<span> …</span>
<span><File End></span>

The data file contains actual time‑ordered data blocks, stored column‑wise for compression efficiency. When a data block is too small, it is first written to a last file and later merged into the data file.

<span><File Start></span>
<span> [File Header]</span>
<span> [Data Block 1]</span>
<span> [Data Block 2]</span>
<span> …</span>
<span> [Data Block N]</span>
<span><File End></span>

Conclusion

TDengine’s innovative architecture—virtualized nodes, append‑only metadata, columnar storage, and efficient write‑ahead logging—significantly improves resource utilization, scalability, and query performance for large‑scale IoT and other time‑series workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

architecture Big Data TDengine storage IoT time-series database

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.