Big Data 11 min read

Data Serviceization at JD: From Zero to One and Beyond

This technical presentation describes JD's data service platform, covering its origin, performance optimizations, flexible API generation, scaling to massive metrics, caching strategies, service orchestration, governance, and a Q&A on security and data‑source flexibility.

DataFunSummit

Oct 25, 2023

Data Serviceization at JD: From Zero to One and Beyond

The session, titled “Data Serviceization at JD,” introduces the theme of data service practice at JD, outlining three modules: the origin of data serviceization, its growth, and how to improve the system.

Origin: JD's Data Intelligence Department needed to rapidly expose data assets via open APIs; typical development took two weeks per API, with up to 80 interfaces required for the 618 promotion. Engineers proposed a solution— the EZD framework—where filling in a SQL statement could automatically generate a performant, parameter‑driven API.

Interface Performance: The first version suffered from high latency because each request had to look up the SQL definition in the database. By caching SQL definitions in an in‑memory routing table and switching to the high‑performance Hikari connection pool, the platform reduced the time spent on API handling from 97% of total latency to just 1%.

Interface Flexibility: Dynamic parameters are injected into SQL using a colon syntax. Complex query conditions are handled with FreeMarker templates (IF, SWITCH, loops), allowing the number of APIs to shrink from 80 to 5 while supporting collections, IN clauses, and conditional logic.

Scaling from 1 to 10: During large‑scale events like JD’s 618 promotion, dozens of metrics (sales, PV, UV, coupons, etc.) are displayed via data APIs. The platform integrates NoSQL sources—Elasticsearch via elasticsearch‑sql, Redis with KV mapping, and HBase with get/scan—enabling rapid development of hundreds of indicators within weeks.

Caching Mechanisms: Two caching strategies are provided. Passive caching creates a cache entry on the first miss, supporting dynamic parameters but causing QPS spikes. Active caching updates caches periodically on the platform side, eliminating spikes but requiring predefined parameter combinations. The system advises using memory databases for hot data rather than full‑dataset caching.

Service Orchestration: Complex business requirements are addressed by composing multiple APIs through a workflow engine (different from human‑approval workflows). An example shows how varying statistical logic across a 28‑hour promotion window is encapsulated in APIs, simplifying downstream consumption.

Data Service Governance: Governance involves producers, consumers, and governance parties. Services are layered from low‑level entities (orders, products, users) to high‑level scenarios (analysis, marketing). Policies, service deduplication, grading, evaluation, and quality control are applied before publishing to the service market.

Q&A: The platform secures data sources by authorizing API groups per data‑source owner and mitigates performance bottlenecks by delegating load to the underlying databases, employing distributed rate‑limiting and circuit‑breaker mechanisms. Switching a MySQL‑based API to ClickHouse is possible if the SQL is compatible.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Caching service governance API generation Data Service JD.com

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.