How Haijing Tech Built a Real-Time Telecom Analytics Platform with ByConity
Haijing Technology faced Hadoop's real‑time limits and ClickHouse's operational pain points, so it adopted the open‑source ByConity platform, which provides a unified table engine, fast multi‑table joins, and seamless scaling to deliver a carrier‑grade real‑time analytics solution.
Haijing Technology, founded in 2003, supplies high‑quality information services to telecom operators worldwide and has been expanding globally, with overseas markets now accounting for nearly 50% of its business. Because overseas operators have smaller user bases and fewer hosts, they require lightweight, independent deployments, prompting Haijing to explore real‑time analytics solutions.
01 Challenges of Real‑Time Analytics for Operators
Real‑time analysis difficulty : Existing stacks cannot perform incremental aggregation on streaming data.
Slow query performance : Compared with modern MPP databases, Hadoop‑based queries are 5‑10× slower, especially when data is skewed.
Write bottlenecks : Concurrent writes from multiple nodes quickly saturate node capacity.
Cluster size limitation : Physical master nodes restrict the number of usable machines; scaling ClickHouse or Greenplum requires data reshuffling and service interruption.
Limited concurrency : Physical master constraints keep concurrent request capacity low.
Low data‑load performance : Loading many small tables proceeds slowly.
Why ByConity Was Chosen
To overcome Hadoop’s real‑time shortcomings, Haijing initially integrated ClickHouse for its strong single‑table aggregation, but faced complex table‑engine choices, difficult multi‑table joins, and cumbersome distributed‑table configurations. After ByConity was open‑sourced, it resolved these pain points with three key capabilities:
Unified table engine (cnchMergetree) : Simplifies table selection for developers.
Significant multi‑table join performance boost : Eliminates the need to define distribution keys.
Simple scaling via compute‑storage separation : Adding or removing hosts does not trigger data reshuffling.
In practice, data from OLTP databases, event streams, file storage, and log files are ingested into ByConity, where they are processed into wide tables and complex metric tables for ad‑hoc, list, and real‑time analysis.
Operator Real‑Time Analytics Scenario
The telecom domain is divided into three logical areas: B (CRM, order handling, etc.), O (mobile number activation, resource control), and M (performance, staffing). Each domain generates thousands of tables that must be integrated in real time, producing wide tables and complex metric tables before being unified across domains for comprehensive customer, user‑association, and order‑ticket analysis.
By visualizing business objects and automatically linking primary‑foreign keys, a unified business view is constructed, providing the foundation for downstream metrics, tags, and orchestration.
Fusion of Hadoop and ByConity for Ad‑Hoc Queries
Haijing implemented a batch‑plus‑stream architecture: Kafka streams real‑time call‑detail records and orders, while Hadoop performs hourly aggregations for complex multi‑domain joins. Every five minutes the batch and stream results are merged into ByConity, enabling rapid ad‑hoc queries and real‑time dashboards.
02 Deep Integration and Packaging of ByConity
After open‑sourcing, Haijing performed extensive testing and built a turnkey Warehouse service with the following enhancements:
Physical‑machine deployment : Supports non‑containerized environments preferred by many B2B customers.
Unlimited host deployment : Allows installations from a single node upward, lowering entry barriers.
Visual component management : Provides UI for database, HDFS, and FoundationDB administration.
Elastic scaling and instance control : Nodes can be added or removed online, with start/stop capabilities.
Online upgrades : Versions can be updated via the UI, avoiding manual steps.
The resulting WhaleHouse layer delivers ultra‑fast unified analysis, supporting real‑time queries, massive list scans, full‑text search, and time‑series workloads. It combines MySQL and ClickHouse protocol compatibility, a vectorized engine, and federated queries across diverse data sources, targeting high‑performance, low‑latency, and high‑concurrency OLAP scenarios.
03 Future Plans
Fast data migration : Visual tools to migrate whole schemas from Oracle, Greenplum, etc.
Offline backup and recovery : UI‑driven backup/restore capabilities.
Data‑lake analysis : Leverage Hudi and materialized views for multi‑source integration.
Materialized‑view analytics : Real‑time data consolidation using materialized views.
Time‑series performance : Exploit ByConity’s high TPS and aggregation speed to boost time‑series ingestion and analytics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Past Memory Big Data
A popular big-data architecture channel with over 100,000 developers. Publishes articles on Spark, Hadoop, Flink, Kafka and more. Visit the Past Memory Big Data blog at https://www.iteblog.com. Search "Past Memory" on Google or Baidu.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
