TDengine Architecture and Storage Design for IoT Big Data
This article explains TDengine’s architecture, including its management, data, and client modules, virtual node design, write process, and detailed storage file structures, highlighting how its innovative design optimizes resource usage and performance for IoT and other big‑data applications.
TDengine is an open‑source, lightweight, high‑performance time‑series database specifically designed for IoT, vehicular networks, industrial internet, and IT operations. It provides a fast time‑series engine, caching, data subscription, and stream processing to reduce development and operations effort.
TDengine Architecture
The system consists of three main modules: the Management Node (MGMT), the Data Node (DNODE), and the Client module. The Management Node handles metadata storage and queries, while the Data Node stores and queries actual time‑series data. Clients first contact the Management Node to obtain metadata before accessing Data Nodes.
Management Node Module
The Management Node stores user, database, and table metadata. Operations such as creating or dropping databases/tables are first processed by this node, which then coordinates with Data Nodes to allocate or release resources.
Data Node Module
Data Nodes use virtual nodes (vnodes) to virtualize physical resources. Each vnode acts as an independent storage unit with its own cache and disk directory, allowing fine‑grained resource allocation and horizontal scaling. Multiple vnodes can run on a single physical machine.
Client Module
The client parses incoming SQL statements, converts them into internal structures, and forwards them to the server modules.
Write Process
Data writes follow a pre‑write‑log (WAL) algorithm: the client’s data is first written to a WAL file, then cached in the target vnode. The server acknowledges success, and data is later persisted to disk either by time‑driven or data‑driven flushing.
Metadata Storage
Metadata files reside in /var/lib/taos/mgmt/ and are append‑only. Example directory structure:
/var/lib/taos/
+--mgmt/
+--db.db
+--meters.db
+--user.db
+--vgroups.dbData Storage Files
Data is stored under /var/lib/taos/data/ and vnode information under /var/lib/taos/tsdb/ . The directory hierarchy includes vnode directories, meter object files, and linked data files. Example:
/var/lib/taos/
+--tsdb/
| +--vnode0
| +--meterObj.v0
| +--db/
| +--v0f1804.head->/var/lib/taos/data/vnode0/v0f1804.head1
| +--v0f1804.data->/var/lib/taos/data/vnode0/v0f1804.data
| +--v0f1804.last->/var/lib/taos/data/vnode0/v0f1804.last1
+--data/
+--vnode0/
+--v0f1804.head1
+--v0f1804.data
+--v0f1804.last1Each vnode has a meterObj file describing vnode basics and table information. The head file stores indexes to data blocks, enabling fast reads by loading all indexes of a table into memory.
<File Start>
[File Header]
[Table1 Offset & Length]
[Table2 Offset & Length]
…
[Table1 Index]
[Table2 Index]
…
<File End>The data file contains actual time‑ordered data blocks, stored column‑wise for compression efficiency. When a data block is too small, it is first written to a last file and later merged into the data file.
<File Start>
[File Header]
[Data Block 1]
[Data Block 2]
…
[Data Block N]
<File End>Conclusion
TDengine’s innovative architecture—virtualized nodes, append‑only metadata, columnar storage, and efficient write‑ahead logging—significantly improves resource utilization, scalability, and query performance for large‑scale IoT and other time‑series workloads.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.