Big Data 11 min read

Data Serviceization at JD: From Zero to One and Beyond

This technical presentation describes JD's data service platform, covering its origin, performance optimizations, flexible API generation, scaling to massive metrics, caching strategies, service orchestration, governance, and a Q&A on security and data‑source flexibility.

DataFunSummit
DataFunSummit
DataFunSummit
Data Serviceization at JD: From Zero to One and Beyond

The session, titled “Data Serviceization at JD,” introduces the theme of data service practice at JD, outlining three modules: the origin of data serviceization, its growth, and how to improve the system.

Origin: JD's Data Intelligence Department needed to rapidly expose data assets via open APIs; typical development took two weeks per API, with up to 80 interfaces required for the 618 promotion. Engineers proposed a solution— the EZD framework—where filling in a SQL statement could automatically generate a performant, parameter‑driven API.

Interface Performance: The first version suffered from high latency because each request had to look up the SQL definition in the database. By caching SQL definitions in an in‑memory routing table and switching to the high‑performance Hikari connection pool, the platform reduced the time spent on API handling from 97% of total latency to just 1%.

Interface Flexibility: Dynamic parameters are injected into SQL using a colon syntax. Complex query conditions are handled with FreeMarker templates (IF, SWITCH, loops), allowing the number of APIs to shrink from 80 to 5 while supporting collections, IN clauses, and conditional logic.

Scaling from 1 to 10: During large‑scale events like JD’s 618 promotion, dozens of metrics (sales, PV, UV, coupons, etc.) are displayed via data APIs. The platform integrates NoSQL sources—Elasticsearch via elasticsearch‑sql, Redis with KV mapping, and HBase with get/scan—enabling rapid development of hundreds of indicators within weeks.

Caching Mechanisms: Two caching strategies are provided. Passive caching creates a cache entry on the first miss, supporting dynamic parameters but causing QPS spikes. Active caching updates caches periodically on the platform side, eliminating spikes but requiring predefined parameter combinations. The system advises using memory databases for hot data rather than full‑dataset caching.

Service Orchestration: Complex business requirements are addressed by composing multiple APIs through a workflow engine (different from human‑approval workflows). An example shows how varying statistical logic across a 28‑hour promotion window is encapsulated in APIs, simplifying downstream consumption.

Data Service Governance: Governance involves producers, consumers, and governance parties. Services are layered from low‑level entities (orders, products, users) to high‑level scenarios (analysis, marketing). Policies, service deduplication, grading, evaluation, and quality control are applied before publishing to the service market.

Q&A: The platform secures data sources by authorizing API groups per data‑source owner and mitigates performance bottlenecks by delegating load to the underlying databases, employing distributed rate‑limiting and circuit‑breaker mechanisms. Switching a MySQL‑based API to ClickHouse is possible if the SQL is compatible.

Big Datacachingservice governanceAPI GenerationData ServiceJD.com
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.