Backend Development 12 min read

Design and Implementation of JD's Real-Time Browsing Record System

The article describes JD's real-time browsing record system architecture, detailing its four modules—storage, query, real-time reporting, and offline reporting—along with hot‑cold data separation, use of Jimdb, HBase, Kafka, and Flink to achieve millisecond‑level latency and high throughput for billions of user records.

JD Retail Technology

Jul 5, 2021

Design and Implementation of JD's Real-Time Browsing Record System

The browsing record system is used to capture JD users' real‑time browsing data and provide millisecond‑level query capabilities, storing up to the most recent 200 records per user.

System Architecture

The architecture consists of four modules: a browsing data storage module, a browsing data query module, a real‑time reporting module, and an offline reporting module.

Browsing Data Storage Module: Stores billions of user browsing records, requiring design for near‑trillion‑row capacity.

Browsing Data Query Module: Provides micro‑service APIs for total count, record list, and deletion of user browsing data.

Browsing Data Real‑Time Reporting Module: Handles real‑time page‑view (PV) data from online users and stores it in a real‑time database.

Browsing Data Offline Reporting Module: Cleans, deduplicates, and filters offline PV data before pushing it to an offline database.

Data Storage Design

Hot and cold data are separated: recent (T‑4) records are stored in Jimdb (in‑memory) for fast access, while older (T+4) records are stored in HBase on disk to reduce cost. Hot data uses Jimdb ordered sets with usernames as keys, SKUs as elements, timestamps as scores, and a 4‑day TTL.

Cold data is stored in a key‑value format in HBase, with usernames (MD5‑hashed and prefixed) as keys and JSON strings of product‑SKU and timestamps as values, preserving order and using a 62‑day TTL.

Query Service Design

The query service offers three APIs: total record count, record list, and record deletion. Rate limiting is implemented with Guava's RateLimiter and Caffeine cache across global, caller, and user dimensions. The service first checks a cache, then queries Jimdb for real‑time records, enriches product info, deduplicates by SPU, and conditionally merges offline data from HBase.

Real‑Time Reporting Module

Front‑end services publish browsing events to Kafka (50 partitions) for load balancing. Flink consumes these events, writes them to Jimdb using Lua scripts to batch commands, ensuring exactly‑once processing and fault tolerance.

Flink is chosen for its low latency, high throughput, and isolated snapshot mechanism, while Kafka provides high‑throughput, low‑latency, scalable, durable, and fault‑tolerant messaging.

Offline Reporting Module

The offline pipeline extracts PV data from MySQL, cleans and de‑duplicates it, filters deleted records, and stores the result in HBase after a daily batch process, supporting up to 60‑day retention and 200 records per user.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

real-time Flink Kafka HBase Browsing Jimdb

Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.