Big Data 20 min read

How Kwai’s OneService Platform Revolutionizes Data Service Development

The article details Kwai’s OneService platform—a low‑code, self‑service data platform that streamlines API creation, deployment, and operation, covering its background, architecture, key technologies such as API matrix and configuration‑as‑code, high‑availability and performance strategies, achieved results, and future roadmap.

Kuaishou Big Data
Kuaishou Big Data
Kuaishou Big Data
How Kwai’s OneService Platform Revolutionizes Data Service Development

OneService Platform Overview

The article introduces Kwai’s OneService platform, a one‑stop self‑service data platform that enables users to create, manage, and operate API services without writing code. It follows a "configuration‑as‑development" approach, turning API configuration into fully functional services.

1. Platform Background

Kwai generates petabytes of data daily, with many business lines (e.g., e‑commerce live streaming) producing massive data assets that need to be packaged as data services for downstream use. The original data‑service workflow (since 2017) was manual, lengthy, and involved multiple hand‑offs, leading to three main pain points:

Efficiency – each API required manual development, taking about nine days per interface.

Cost – siloed development across business lines caused waste.

Quality – lack of monitoring for data accuracy and service stability, occasionally causing online failures.

To address these issues, Kwai introduced the Kwai OneService platform, a low‑code solution that lets users create APIs through configuration alone, handling code generation, deployment, caching, degradation, and permission control automatically.

2. Architecture Design and Key Technologies

2.1 Technical Architecture

The platform consists of three layers from bottom to top: data‑development layer, data‑service layer, and data‑application layer.

Data‑development layer processes raw ODS tables into business‑valuable assets stored mainly in offline Hive tables. The data‑service layer includes a service generation engine (auto‑generating client code and deploying services) and a call chain that accelerates data from slow warehouses to fast storage engines such as Redis, HBase, ClickHouse, and Kwai’s proprietary engines. A unified query layer sits above these heterogeneous stores.

2.2 API Matrix

The platform defines an API matrix that categorizes API templates by query flexibility and speed. Typical API types include KV API (point‑lookup, built on KV engines like Redis), SQL API (OLAP‑based, flexible multi‑dimensional queries), Template API (pre‑defined ClickHouse/Druid templates with runtime parameters), OneAPI (unified DSL for all scenarios), and public domain APIs (e.g., OneID, geolocation).

2.3 Configuration‑as‑Development

Users configure API metadata (source tables, RPC/HTTP style, batch access, QPS limits, test data sources, etc.). The platform then automatically generates client SDKs, deploys services, handles data acceleration, validation, testing, and publishing without further user involvement.

2.4 Unified Query Platform

A unified query platform abstracts heterogeneous engines (Redis, HBase, Druid, ClickHouse) and offers two query syntaxes: OneSQL (SQL‑subset covering KV, OLAP, OLTP) and OneDSL (custom DSL for native engine capabilities). Execution modes (Local, Standalone, Cluster) resemble Spark, supporting varied workload requirements.

3. Availability and Performance Construction

3.1 High Availability

Challenges include thousands of services with ~20 million QPS, critical online business workloads, numerous external dependencies, and special guarantees for activity‑related APIs. The platform adopts a three‑stage approach: proactive design for robustness, comprehensive alerting for rapid detection, and post‑incident mitigation (degradation, throttling).

Tiered isolation – services are deployed in separate “cabins” based on business and priority, achieving physical isolation.

Rate‑limit and degradation – request sampling, abnormal request throttling, external‑service fallback, and storage‑engine fallback mechanisms.

3.2 High Performance

To meet sub‑millisecond latency for high‑QPS scenarios (e.g., user‑profile systems), the platform provides a direct‑connect storage solution that lets client SDKs bypass the server and read from Redis, cutting latency by ~50 % and saving server resources. Multi‑level caching (L0‑L2) is also configurable per API.

4. Achievements

Since 2019, OneService hosts over 1,000 APIs, handles nearly 20 million QPS, stores hundreds of terabytes, and serves hundreds of business lines. KV API latency is at the millisecond level, core service availability reaches 99.99 %, and the platform has supported major events (New Year, Spring Festival, Shopping Festival, Olympics) with zero incidents.

5. Future Plans

The platform will evolve into a unified data‑service platform, expanding asset types (e.g., graph data) and deepening business alignment. Four focus areas are:

Basic capabilities – extend unified query, unified datasets, and unified metric services.

Advanced capabilities – introduce service orchestration and function computing for customizable services.

Product efficiency – continuously improve development and operation efficiency.

Service governance – apply intelligent automation for governance and reliability.

performance optimizationbig dataHigh Availabilitylow-codedata platformAPI Service
Kuaishou Big Data
Written by

Kuaishou Big Data

Technology sharing on Kuaishou Big Data, covering big‑data architectures (Hadoop, Spark, Flink, ClickHouse, etc.), data middle‑platform (development, management, services, analytics tools) and data warehouses. Also includes the latest tech updates, big‑data job listings, and information on meetups, talks, and conferences.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.