Databases 12 min read

OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage

The article offers a detailed technical overview of OpenTSDB’s architecture and data model, explaining how it leverages HBase for scalable time‑series storage, describing core concepts, table schemas, ingestion flow, performance considerations, and future alternatives for large‑scale monitoring workloads.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
OpenTSDB: Architecture, Data Model, and HBase Integration for Time-Series Data Storage

This article provides a comprehensive technical overview of OpenTSDB, a distributed time-series database built on top of HBase. It begins by explaining the characteristics of time-series data—high-frequency generation, strict time dependency, and massive volume—and highlights why traditional relational databases are inadequate for such workloads.

The document then systematically introduces core concepts of time-series databases, with a focus on OpenTSDB’s design principles and architecture. Key sections include:

Core Features : raw data preservation, millisecond precision, infinite retention, horizontal scalability via HBase/Hadoop, and rich query capabilities via GUI, HTTP API, and third-party frontends.

Core Concepts : Metric, Tags (tagk/tagv pairs), Value, Timestamp, and DataPoint, illustrated with a concrete example.

Deployment Architecture : Stateless TSD nodes (no master), reliance on HBase for storage, and horizontal scalability.

HBase Primer : A concise yet technically accurate summary of HBase’s architecture—including its 4D coordinate system (row key, column family, column qualifier, timestamp), basic operations (Get, Put, Delete, Scan, Increment), and physical/logical storage models.

HBase Tables for OpenTSDB : Detailed explanation of the four required HBase tables: tsdb , tsdb-uid , tsdb-tree , and tsdb-meta , including their purposes and HBase create DDL statements.

Data Ingestion Flow : A step-by-step walkthrough of how a single DataPoint is stored across HBase tables, supported by code snippets and screenshots of actual HBase scans before and after insertion.

Table-Level Deep Dive : Analysis of row key structure, column qualifier encoding (2-byte vs. 4-byte offsets), value storage (8-byte double/long), and design optimizations (UID mapping, hourly row compaction, salted row keys).

Usage Guidelines and Limitations : Practical considerations including character encoding (ISO-8859-1 vs UTF-8), pre-splitting for write scalability, scalability limits (millions—not billions—of points), lack of per-business table isolation, and limited real-time aggregation on single nodes.

Future Outlook : Recommendations to use Druid or InfluxDB for larger-scale time-series workloads.

The article is rich in technical detail, includes multiple HBase schema definitions, and provides clear visual aids (e.g., architecture diagrams, table structures), making it highly valuable for database engineers, SREs, and data infrastructure architects working with time-series monitoring systems.

data modelingHBasetime-series databaseScalable storagemonitoring systemsOpenTSDBrow key designUID mapping
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.