Big Data 16 min read

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

This article details the design and implementation of 58.com’s real‑time computing platform, covering its architecture, data ingestion, storage, Flink‑based stream processing, SQL extensions, performance optimizations, Storm‑to‑Flink migration tools, the Wstream management console, state handling, monitoring, and future roadmap.

Big Data Technology Architecture

Sep 17, 2021

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

The real‑time computing platform at 58.com provides a one‑stop service for massive data, divided into three directions: high‑speed real‑time storage, distributed real‑time computation, and data distribution to downstream stores.

Platform construction consists of basic capabilities (Kafka, Storm, Flink, SparkStreaming clusters) and platform‑level services such as cluster management and governance (Nightfury).

Data ingestion uses Kafka with Canal or Debezium for binlog, Flume for business logs, and supports multiple real‑time storage backends (Kafka, Druid, HBase, Elasticsearch, ClickHouse).

Flink was adopted as the main compute engine after evaluating Storm and SparkStreaming, offering exactly‑once semantics, higher throughput, and better resource utilization. The Flink cluster runs on YARN with high‑availability, HDFS federation, node labeling, cgroup isolation, and queue‑based resource management.

Real‑time SQL evolved from a custom Flink 1.6 DML‑only version to the community Flink SQL, adding DDL support, Blink features, Hive integration, metadata management, lineage, and permission control.

Storage extensions enable unified access to internal and external stores (e.g., wmb, Redis, ClickHouse) and custom formats via UDFs, with configurable concurrency for sources and sinks.

Performance optimizations include Blink mini‑batch processing, two‑stage aggregation, emit support, and asynchronous I/O for dimension tables, with both native async clients and simulated async with caching and LRU.

The Storm‑to‑Flink migration tool converts Storm topologies (spouts, bolts) to Flink sources, transforms, and sinks, adding checkpoint support, async I/O, and timer simulation, while handling Yarn deployment through Flink’s ClusterClient.

Task migration is streamlined: users replace the Storm jar with the upgraded Flink jar, keeping business logic unchanged, achieving millisecond‑level latency, 3‑5× throughput increase, and ~40% resource savings.

The Wstream platform abstracts cluster details, supports Flink Jar, Flink SQL, Flink‑Storm, and PyFlink submissions, provides task management, monitoring, diagnostics, data‑warehouse features (metadata, lineage, permissions), and SQL debugging tools (syntax checking, graph validation, data generation, result redirection).

Monitoring combines Flink metrics, Yarn status, Kafka lag, and custom business metrics, visualized via Prometheus‑pushgateway federation and Grafana dashboards, with alerting on message backlog, QPS fluctuations, checkpoint failures, and latency.

Future plans focus on batch‑stream convergence, dynamic resource tuning, proactive intelligent monitoring, and adopting emerging community capabilities.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink Data Platform Real‑Time Computing Storm Migration

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.