Big Data 10 min read

DataCake: A Multi‑Cloud Self‑Service Big Data Platform from SHAREit Group

The article introduces DataCake, a cloud‑native, multi‑cloud big data platform built by SHAREit Group that addresses massive data volume, diverse application scenarios, and governance challenges through a Data Mesh‑inspired self‑service architecture, offering unified data management, intelligent governance, and a roadmap for future enhancements.

DataFunSummit

Apr 8, 2023

DataCake: A Multi‑Cloud Self‑Service Big Data Platform from SHAREit Group

SHAREit Group (formerly Qiezi Technology) has rapidly grown its user base to over 2.4 billion installations worldwide, creating massive data demands that require a sophisticated big‑data platform.

The article first outlines the background and challenges: exponential data growth, expanding application scenarios, and untapped data potential, leading to three core problems—data not delivering business value, and difficulty in data governance due to complex, fragmented pipelines.

Three stakeholder perspectives are highlighted: business owners struggling to turn data into value, analysts facing steep learning curves and long development cycles, and technical leads dealing with exploding ETL tasks, unclear data lineage, and opaque cloud‑native tools.

To solve these issues, DataCake adopts a Data Mesh philosophy, shifting from a centralized data team to domain‑driven ownership, and implements three key concepts: a self‑serve platform, treating data as a product, and federated governance that combines distributed development with centralized control.

DataCake’s functional pillars include:

Self‑service big‑data application platform: low‑code pipelines, unified analytics, visualisation, and custom reporting.

Intelligent data governance and security: cost‑aware billing, AI‑assisted governance, and fine‑grained permission management.

Unified data management: metadata cataloguing, data‑asset discovery, quality monitoring, and breaking data silos across lakes, warehouses, and databases.

Lake‑warehouse integration: direct ingestion of raw data into the lake with optional warehousing for less‑time‑critical workloads.

The technical architecture is described across three layers:

IaaS: built on multiple public‑cloud providers to avoid vendor lock‑in.

PaaS: serverless compute supporting ad‑hoc, batch, streaming, and native cloud engines, with elastic scaling.

SaaS: integration with tools like HUE and Tableau, unified resource management, and cross‑cloud data access. Additional capabilities include minimal‑code data analysis, low‑threshold data development via templated pipelines, unified data management with lineage visualisation, and AI‑driven automated governance. The roadmap foresees a fully managed SaaS offering across multiple clouds and continued development of an open‑source, intelligent, one‑stop big‑data platform that maximises business value.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multi-cloud data governance Self-Service Platform Data Mesh

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.