SelectDB Boosts GuanceDB Observability: Architecture Upgrade, Cost Reduction, and Performance Gains
This article details how SelectDB’s inverted‑index, Variant data type, and sampling capabilities were integrated into GuanceDB to replace Elasticsearch, achieving up to 70% storage cost reduction, 2‑4× query speed improvement, and a ten‑fold overall cost‑performance boost for log analytics and observability workloads.
Background
As cloud computing matures, enterprises increasingly move services to the cloud, but traditional monitoring and fault‑diagnosis methods cannot keep up with the scale and complexity of modern observability requirements.
Guance Cloud (观测云) provides unified monitoring, alerting, visualization, and automated inspection for cloud, cloud‑native, application, and business data. Its original architecture relied on Elasticsearch/OpenSearch for log storage and analysis, which suffered from high resource consumption, poor schema‑free support, and limited aggregation performance.
Selection Goals and Evaluation
The team evaluated several databases and defined criteria such as high‑throughput writes, full‑text inverted indexing, support for schema‑free tables, multi‑tenant isolation, and hot‑cold data tiering. SelectDB met these goals with features like inverted indexes, Variant type, and efficient storage compression.
Architecture Upgrade with SelectDB
GuanceDB’s new architecture replaces the Elasticsearch layer with SelectDB while keeping the existing query engine. Data ingestion is handled by DataKit, which streams data to the Guance‑Insert component; Guance‑Select executes DQL queries, pushing supported operations to the SelectDB front‑end (FE) and falling back to back‑end (BE) processing when necessary.
Key Enhancements
Inverted Index : Supports MATCH_ALL, MATCH_ANY, MATCH_PHRASE for multilingual full‑text search, dramatically improving log retrieval speed.
Variant Data Type : Stores arbitrary JSON, automatically extracts sub‑fields, avoids schema explosion, and resolves type conflicts across partitions.
Sampling Logic : Automatically samples large result sets (over 10 million rows) using SelectDB’s TableSample, reducing aggregation cost while preserving accuracy.
Performance and Cost Benefits
After migration, storage cost dropped by ~70% and query performance increased 2‑4×, allowing the same workload to run on 13 instead of 20 cloud VMs—a 67% cost reduction. SelectDB’s columnar storage and ZSTD compression achieve a 1:8 compression ratio versus 1:1.5 for Elasticsearch. Hot‑cold tiering further lowers long‑term storage expenses.
Benchmark tests show point‑lookups and list queries up to 2× faster, and aggregation queries up to 4× faster than Elasticsearch.
Code Example
The following DDL and DQL statements illustrate how to create a log table with inverted indexes and perform common queries.
CREATE TABLE httplog (
`ts` DATETIME,
`clientip` VARCHAR(20),
`request` TEXT,
INDEX idx_ip (`clientip`) USING INVERTED, -- no tokenization
INDEX idx_req (`request`) USING INVERTED PROPERTIES("parser" = "chinese") -- Chinese tokenization
) DUPLICATE KEY(`ts`);
-- Query latest 10 rows where clientip = '8.8.8.8'
SELECT * FROM httplog WHERE clientip = '8.8.8.8' ORDER BY ts DESC LIMIT 10;
-- Query rows where request contains "error" or "404"
SELECT * FROM httplog WHERE request MATCH_ANY 'error 404' ORDER BY ts DESC LIMIT 10;
-- Query rows where request contains both "image" and "faq"
SELECT * FROM httplog WHERE request MATCH_ALL 'image faq' ORDER BY ts DESC LIMIT 10;
-- Phrase search for "查询错误"
SELECT * FROM httplog WHERE request MATCH_PHRASE '查询错误' ORDER BY ts DESC LIMIT 10;Conclusion
Integrating SelectDB into GuanceDB delivers a schema‑free, high‑performance, and cost‑effective solution for log storage and analysis, enabling a ten‑fold improvement in overall value‑for‑money while supporting a wide range of observability scenarios.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.