Big Data 7 min read

Design and Implementation of a Data Service Platform for New Media Business

This article details the background, challenges, design principles, and implementation of a unified data service platform—including data modeling, multi-source governance, real-time processing, and a Doris-based storage solution—to support large‑scale video data for a new media operation.

Zhuanzhuan Tech
Zhuanzhuan Tech
Zhuanzhuan Tech
Design and Implementation of a Data Service Platform for New Media Business

The new media business of ZhaiZhai relies on major short‑video platforms (Douyin, Kuaishou, Bilibili, Xiaohongshu, etc.) for content marketing, creating a massive volume of video data that must be collected, stored, and processed efficiently.

Current pain points include inconsistent data definitions across modules, lack of unified data governance, delayed data collection due to external services, and insufficient storage capacity for billions of video records, hindering real‑time analysis.

To address these issues, a Data Service Platform (DSP) was designed to provide one‑stop unified data definition, production, and consumption, enabling standardized data models, automated pipelines, and consistent data products.

Design and Implementation

4.1 Data Model Construction – Established a unified data definition and visualized the data model (see diagram).

4.2 Multi‑Data‑Source Governance – Implemented source data standardization, pre‑ingestion validation to block dirty data, and priority rules for authoritative sources, ensuring high‑quality, reliable data across heterogeneous inputs.

4.3 Real‑Time Data Processing – Shifted data collection control to users, listening to upstream MQ messages for agile, real‑time data handling.

4.4 Data Storage Solution – Chose Apache Doris for its distributed architecture, sub‑second query performance, real‑time query support, MySQL compatibility, and ability to map MySQL external tables, outperforming the previous HBase approach for complex relational queries.

// 统一数据格式
class StandardData {
    // 数据收集时间
    Time dataTime;
    // 数据来源
    DataSource dataSource;
    // 数据内容
    DataDetail dataDetail;
}
// 数据分发处理
class DataHandleDispatcher {
    DataHandler findHandler(StandardData standardData) {
        return dataHandler;
    }
}
// 数据处理器
class DataHandler {
    Rule rule;
    void handler(StandardData standardData) {
        if (check(rule, standardData)) {
            doHandler(standardData);
        }
    }
    void doHandler(StandardData standardData) {
        //
    }
    boolean check(Rule rule, StandardData standardData) {
        //
    }
}

In summary, the DSP has become a core component of ZhaiZhai's new media ecosystem, providing unified data services and will continue to evolve to meet future business and user demands.

Big DataReal-time ProcessingData Modelingdata-platformdata governanceApache Doris
Zhuanzhuan Tech
Written by

Zhuanzhuan Tech

A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.