Databases 13 min read

Risk Insight Platform Architecture and ClickHouse Implementation for Real-Time Risk Monitoring

The article presents a comprehensive risk insight platform built on ClickHouse, Flink, and intelligent algorithms, detailing its architecture, technical challenges, solutions, real-time data modeling, practical applications in fraud detection and user behavior analysis, and future optimization directions.

JD Tech Talk

Nov 30, 2022

Risk Insight Platform Architecture and ClickHouse Implementation for Real-Time Risk Monitoring

The risk insight platform, authored by Li Danfeng from JD Technology Risk Management Center, leverages ClickHouse, Flink, and intelligent algorithms to establish a multi-layered, real-time risk monitoring system supporting over ten core risk scenarios such as fraud, credit, and anti‑money‑laundering, handling up to 34 million messages per minute during peak periods.

Technical Challenges

High‑throughput real‑time data ingestion, with peak traffic reaching 120 million events per minute.

High‑performance real‑time querying over massive datasets.

Complex aggregation capabilities beyond simple single‑table operations.

Maintaining computational performance at massive data scales.

Solution Overview

After evaluating major OLAP engines (Presto, Impala, Druid, Kylin), ClickHouse was selected for its MPP architecture, LSM‑based in‑memory sorting, high compression (up to 20:1), columnar storage, sparse indexes, and vectorized execution, delivering sub‑second query responses on billions of rows. Combined with Flink for pre‑aggregation, the architecture achieves high‑throughput ingestion, fast queries, complex aggregations, and reduced storage costs.

Overall Architecture

The platform consists of modular data sources (ClickHouse, MySQL, Presto, etc.), an event bus with source‑transform‑sink plugins supporting Groovy/Python scripts, a three‑tier ClickHouse storage model (RODS, RDWM, RDWS), unified data modeling via ANSI‑SQL, algorithm services for anomaly detection and attribution, and upper‑layer risk insight applications (alerts, reports, strategy analysis).

ClickHouse Real‑Time Data Model Design

Shortened hierarchy: direct Flink‑driven pipelines from raw (RODS) to summarized (RDWM) and aggregated (RDWS) layers.

Dimension flattening: frequently used dimension fields are denormalized into wide fact tables to avoid costly joins.

Flink pre‑aggregation: real‑time metric aggregation reduces downstream query load, especially during high‑traffic events.

Practical Applications

1. Marketing Anti‑Fraud during Large‑Scale Promotions

Challenges included massive, complex MQ messages (≈17 KB each) and traffic spikes up to 60 times normal volume. Solutions involved splitting consumption and ClickHouse clusters by business domain, implementing dynamic batch‑write strategies, and applying Flink‑based minute‑level pre‑aggregation, which reduced raw data size by 95 % and improved query efficiency.

2. User Behavior Path Analysis

To handle massive event logs, a common table is generated on‑the‑fly to limit query scope, ClickHouse’s primary sparse index (e.g., on PIN) is leveraged to avoid full scans, and bitmap functions (bitmapCardinality, bitmapAndCardinality, etc.) are used for efficient deduplication.

3. ClickHouse Production Operations

Adjusted Zookeeper log and snapshot retention to mitigate disk alerts.

Used local tables with VIP writes to balance disk I/O and avoid merge bottlenecks.

Explored RaftKeeper as a replacement for Zookeeper to improve cluster stability.

Future Outlook

ClickHouse demonstrates significant performance advantages for large‑scale read/write workloads in real‑time risk analysis. Ongoing efforts focus on addressing high concurrency and Zookeeper instability through cluster segmentation, SQL optimizations, and adopting newer features such as RaftKeeper to further enhance reliability and query throughput.

-end-

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data engineering Big Data real-time analytics OLAP risk monitoring

Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.