Big Data 16 min read

Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

This article presents a comprehensive case study of Manbang's real‑time data platform, detailing its business background, cloud‑native Flink + Hologres architecture, migration from self‑built clusters, real‑time product features, decision‑making workflows, and future roadmap, highlighting performance and cost benefits.

DataFunTalk
DataFunTalk
DataFunTalk
Real‑Time Data Platform Architecture and Cloud‑Native Flink Migration at Manbang

Manbang Group builds a smart logistics ecosystem using mobile Internet, big data and AI to connect drivers and shippers, aiming to reduce costs and improve efficiency.

Real‑Time Data Matrix

The real‑time data matrix adopts a cloud‑native OLAP architecture based on Alibaba Cloud Flink + Hologres, covering user, cargo, traffic, payment, transaction, marketing and CRM layers, as well as a unique real‑time supply‑demand layer that aggregates driver and cargo distribution, secondary cargo categories, and contextual information such as weather and epidemic status.

Real‑time features are generated at minute‑ or second‑level granularity to empower algorithms and operations, including driver and shipper behavior features, probability distributions, and price distributions.

Real‑Time Computing Platform Architecture

The platform spans multiple data centers: Center A hosts production systems and real‑time computing, while Center B hosts offline clusters. Key practices include full‑link stability, high‑frequency data ingestion, unified data standards, migration to Alibaba Hologres for lower operational cost, unified data services via APIs and message queues, and integration with an algorithmic real‑time sample attribution platform.

Migration from Self‑Built Flink to Cloud‑Native Alibaba Cloud Real‑Time Computing

Motivated by stability issues and high O&M cost of self‑built Flink clusters, Manbang migrated 560 Flink jobs to Alibaba Cloud's fully managed Flink service within 1.5 months. SLA improved from 95 % to 99 %, operational staff reduced from three to one (saving 420 person‑days annually), and resource consumption dropped by ~35 %.

Development efficiency increased, with each task’s development, tuning and deployment gaining two days, saving 600 person‑days per year for 300 tasks. Flink SQL tasks now consume 6.67 CU on average, achieving a 40 % resource saving.

From Real‑Time Data to Real‑Time Decision

Real‑time decision making is split between algorithmic and operational scenarios. In recommendation pipelines, real‑time metrics are used throughout AB testing, recall, coarse ranking, truncation, fine ranking and final strategy stages, ensuring sub‑minute latency and >97 % sample accuracy.

Real‑Time Feature Computing Warehouse

Using Flink CDC, data is streamed into Hologres for real‑time ingestion, then further processed to build a unified analytical service. A batch real‑time feature framework supports diverse time windows (seconds to hours) and complex feature engineering (log, ratio, count, day‑category, cross‑features, etc.), reducing feature development cycles from 3 days to 0.5 day and cutting core usage from 120 to 16 tasks.

Real‑Time Products

Manbang launched a real‑time alert platform ("Beacon") for metric visualization and alarm configuration, and a real‑time data service platform that exposes APIs via a data‑mall, enabling pay‑per‑use consumption.

Future Plans

Since 2006 the stack evolved from Hadoop to Spark Streaming, Flink DataStream, real‑time warehouses, and now cloud‑native Flink. In 2023 the team will deepen data‑business integration, build a Decision‑Platform 2.0, explore Flink‑on‑AI, and develop a cross‑cloud OLAP engine based on Hologres and other providers.

cloud nativeBig DataFlinkStreamingData Warehousereal-time dataLogistics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.