Big Data 24 min read

Evolution and Technical Architecture of Ant Financial's Data Analysis Platform

This article presents a comprehensive overview of Ant Financial's data analysis platform, detailing its departmental role, the data analysis lifecycle, the platform's evolution from version 1.0 to 3.0, core technical components such as intelligent sync and pre‑computation, and a practical case study of performance optimization.

DataFunTalk
DataFunTalk
DataFunTalk
Evolution and Technical Architecture of Ant Financial's Data Analysis Platform

The talk is divided into four parts. Part 1 introduces the Data Platform Department, its responsibilities across the data pipeline, and the core components of the data operating system, including the foundational framework, core capabilities (data security, privacy, quality, metadata, governance), and data engines for task scheduling, scientific analysis, and decision services.

Part 2 outlines the data analysis domain, describing the four analysis stages—descriptive, diagnostic, predictive, and prescriptive—and how automation and machine learning increasingly drive insights.

Part 3 traces the evolution of Ant Financial's data analysis platform: version 1.0 (basic reporting with limited performance), version 2.0 (datasets, multidimensional analysis, automatic acceleration, and basic openness), and version 3.0 (intelligent synchronization, intelligent pre‑computation, and advanced query routing). It explains the logical plan translation from dataset to table to datasource, the cost‑model‑based engine selection, and the plugin‑based SPI abstraction that enables multi‑engine support.

Part 4 demonstrates a real‑world application where the platform is used to identify and resolve performance bottlenecks. The process includes problem definition, metric establishment (experience and baseline RT targets), mathematical abstraction of query paths, data collection, analysis of high‑latency intervals, root‑cause identification (e.g., count‑distinct inefficiency on a specific source), and quantifying the impact of targeted optimizations on overall response‑time metrics.

The presentation concludes with a summary of the data analysis workflow: define the problem, measure it, abstract it mathematically, collect data, perform descriptive/diagnostic/predictive analysis, and drive decisions and actions based on insights.

analyticsDataEngineeringbigdataDataAnalysisDataPlatformIntelligentSync
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.