Big Data 10 min read

Technical Evolution and Production Optimization of Real‑Time OLAP at BTC.com

This article details BTC.com’s journey in building a real‑time OLAP platform for blockchain data, covering business background, challenges, architectural evolution, technology choices such as Flink and ClickHouse, optimization techniques, monitoring, and future directions.

DataFunTalk
DataFunTalk
DataFunTalk
Technical Evolution and Production Optimization of Real‑Time OLAP at BTC.com

Business Background – BTC.com provides blockchain solutions with four core modules: AI/ML, blockchain, cloud, and data. Their OLAP needs arise from tasks like forensic analysis of blockchain transactions and real‑time risk control.

Opportunities and Challenges – The original 2018 architecture (Parser → MySQL → Hive/Presto → Spark) suffered from lack of real‑time processing, single‑point failures, low query efficiency, and insufficient monitoring.

Technology Selection – After evaluating options, the team chose PyFlink for its flexible windows, low latency, and second‑level processing, and ClickHouse for its high‑speed query capabilities. Python expertise guided the adoption of PyFlink.

Architecture Evolution – The new pipeline parses blockchain nodes to Kafka, then feeds both Flink and Spark. Flink writes results to MySQL and ClickHouse, supporting reporting, statistics, synchronization, and OLAP. Data governance follows a layered model (raw, detail, summary, application) and machine‑learning tasks run on Kubernetes.

Optimization Details – Custom sinks, batch import strategies, upsert handling via temporary tables, and UDF extensions for ClickHouse were implemented. Kubernetes provides high‑availability storage, horizontal scaling, and service discovery.

Monitoring and Consistency – Prometheus monitors the stack, while final queries, idempotent writes, and checkpoint mechanisms ensure data consistency.

Future Outlook – Plans include expanding business scenarios, deeper blockchain analytics, integrating Flink with machine learning, and contributing to open‑source ecosystems.

big dataFlinkKubernetesClickHouseReal-time OLAPblockchain analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.