Fundamentals of Data Middle Platform: Logic, Principles, and Practice
This article explains what a data middle platform is, why organizations need it, its core principles, technical architecture, and practical implementation guidelines, highlighting how it solves issues like inconsistent metrics, duplicate data construction, low query efficiency, poor data quality, and high development costs.
Introduction
Since 2015, the term “data middle platform” has become popular, and after Alibaba’s 2020 debate on dismantling the middle platform, this talk explores the concept from both a popular‑science and enterprise‑practice perspective.
What Is a Data Middle Platform?
Inspired by Supercell’s small‑team, high‑revenue model, the middle platform sits between data sources and business applications, providing shared data services across the organization. Alibaba defines it as a combination of methodology, organization, and tools: OneID + OneModel + OneService, a talent structure that includes data product managers, data engineers, and data scientists, and a suite of tools for data collection, construction, management, and service.
Why Is a Data Middle Platform Needed?
Inconsistent metric definitions across thousands of indicators.
Duplicate data construction by different teams or projects.
Low data‑retrieval efficiency due to fragmented tables and layers.
Poor data quality caused by lack of end‑to‑end lineage.
High construction and maintenance costs.
These problems can be mitigated by adopting a unified data middle platform.
Principles of a Data Middle Platform
Allocate core resources to core projects rather than a “race‑horse” approach.
Build a generic platform instead of business‑specific “BP” solutions.
Avoid short‑term, rapid‑change tactics; focus on steady, long‑term development.
Technical Principles
The platform typically follows a “three‑horizontal, one‑vertical” architecture: data ingestion, data development, and data application (the three horizontals) and data management (the vertical) covering metadata, resource, asset management, governance, and security.
Practical Implementation
Common pain points in OneData include unclear data sources, inconsistent metric definitions, and lack of standards. Solutions involve:
Standardizing naming conventions for atomic and derived metrics.
Defining clear production, review, authorization, and governance processes.
Mapping business lines to domains, topics, and metric dimensions.
Example: In an e‑commerce scenario, the transaction domain defines atomic metrics (e.g., sales count) and derived metrics (e.g., 7‑day hot‑item rate) with consistent naming and dimension definitions.
Summary
A data middle platform unifies data collection, computation, storage, and processing while standardizing metrics.
It addresses metric inconsistency, duplicate construction, low query efficiency, data quality issues, and high costs.
It is suitable for companies with multiple business lines, cost‑reduction or efficiency‑improvement needs, and a willingness to invest in long‑term development.
Key organizational and methodological principles include centralized resources, generic platform design, and a patient, steady‑state approach.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.