Big Data 10 min read

DataCake: A Multi‑Cloud Self‑Service Big Data Platform from SHAREit Group

The article introduces DataCake, a cloud‑native, multi‑cloud big data platform built by SHAREit Group that addresses massive data volume, diverse application scenarios, and governance challenges through a Data Mesh‑inspired self‑service architecture, offering unified data management, intelligent governance, and a roadmap for future enhancements.

DataFunSummit
DataFunSummit
DataFunSummit
DataCake: A Multi‑Cloud Self‑Service Big Data Platform from SHAREit Group

SHAREit Group (formerly Qiezi Technology) has rapidly grown its user base to over 2.4 billion installations worldwide, creating massive data demands that require a sophisticated big‑data platform.

The article first outlines the background and challenges: exponential data growth, expanding application scenarios, and untapped data potential, leading to three core problems—data not delivering business value, and difficulty in data governance due to complex, fragmented pipelines.

Three stakeholder perspectives are highlighted: business owners struggling to turn data into value, analysts facing steep learning curves and long development cycles, and technical leads dealing with exploding ETL tasks, unclear data lineage, and opaque cloud‑native tools.

To solve these issues, DataCake adopts a Data Mesh philosophy, shifting from a centralized data team to domain‑driven ownership, and implements three key concepts: a self‑serve platform, treating data as a product, and federated governance that combines distributed development with centralized control.

DataCake’s functional pillars include:

Self‑service big‑data application platform: low‑code pipelines, unified analytics, visualisation, and custom reporting.

Intelligent data governance and security: cost‑aware billing, AI‑assisted governance, and fine‑grained permission management.

Unified data management: metadata cataloguing, data‑asset discovery, quality monitoring, and breaking data silos across lakes, warehouses, and databases.

Lake‑warehouse integration: direct ingestion of raw data into the lake with optional warehousing for less‑time‑critical workloads.

The technical architecture is described across three layers:

IaaS: built on multiple public‑cloud providers to avoid vendor lock‑in.

PaaS: serverless compute supporting ad‑hoc, batch, streaming, and native cloud engines, with elastic scaling.

SaaS: integration with tools like HUE and Tableau, unified resource management, and cross‑cloud data access. Additional capabilities include minimal‑code data analysis, low‑threshold data development via templated pipelines, unified data management with lineage visualisation, and AI‑driven automated governance. The roadmap foresees a fully managed SaaS offering across multiple clouds and continued development of an open‑source, intelligent, one‑stop big‑data platform that maximises business value.

Big DataMulti-CloudData GovernanceSelf-Service PlatformData Mesh
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.