Big Data 16 min read

Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

This article details the design and implementation of 58.com’s real‑time computing platform, covering its architecture, data ingestion, storage, Flink‑based stream processing, SQL extensions, performance optimizations, Storm‑to‑Flink migration tools, the Wstream management console, state handling, monitoring, and future roadmap.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Real‑time Computing Platform Architecture, Flink Migration, and One‑stop Platform at 58.com

The real‑time computing platform at 58.com provides a one‑stop service for massive data, divided into three directions: high‑speed real‑time storage, distributed real‑time computation, and data distribution to downstream stores.

Platform construction consists of basic capabilities (Kafka, Storm, Flink, SparkStreaming clusters) and platform‑level services such as cluster management and governance (Nightfury).

Data ingestion uses Kafka with Canal or Debezium for binlog, Flume for business logs, and supports multiple real‑time storage backends (Kafka, Druid, HBase, Elasticsearch, ClickHouse).

Flink was adopted as the main compute engine after evaluating Storm and SparkStreaming, offering exactly‑once semantics, higher throughput, and better resource utilization. The Flink cluster runs on YARN with high‑availability, HDFS federation, node labeling, cgroup isolation, and queue‑based resource management.

Real‑time SQL evolved from a custom Flink 1.6 DML‑only version to the community Flink SQL, adding DDL support, Blink features, Hive integration, metadata management, lineage, and permission control.

Storage extensions enable unified access to internal and external stores (e.g., wmb, Redis, ClickHouse) and custom formats via UDFs, with configurable concurrency for sources and sinks.

Performance optimizations include Blink mini‑batch processing, two‑stage aggregation, emit support, and asynchronous I/O for dimension tables, with both native async clients and simulated async with caching and LRU.

The Storm‑to‑Flink migration tool converts Storm topologies (spouts, bolts) to Flink sources, transforms, and sinks, adding checkpoint support, async I/O, and timer simulation, while handling Yarn deployment through Flink’s ClusterClient.

Task migration is streamlined: users replace the Storm jar with the upgraded Flink jar, keeping business logic unchanged, achieving millisecond‑level latency, 3‑5× throughput increase, and ~40% resource savings.

The Wstream platform abstracts cluster details, supports Flink Jar, Flink SQL, Flink‑Storm, and PyFlink submissions, provides task management, monitoring, diagnostics, data‑warehouse features (metadata, lineage, permissions), and SQL debugging tools (syntax checking, graph validation, data generation, result redirection).

Monitoring combines Flink metrics, Yarn status, Kafka lag, and custom business metrics, visualized via Prometheus‑pushgateway federation and Grafana dashboards, with alerting on message backlog, QPS fluctuations, checkpoint failures, and latency.

Future plans focus on batch‑stream convergence, dynamic resource tuning, proactive intelligent monitoring, and adopting emerging community capabilities.

big dataFlinkstreamingdata platformReal-Time ComputingStorm Migration
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.