Big Data 23 min read

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

To meet Double‑Eleven’s sub‑second, billion‑item feed demands, Alibaba’s Xianyu selection system evolved from a Solr‑based search pipeline through offline batch and PostgreSQL attempts to a Blink‑powered real‑time stream platform using Niagara’s low‑latency LSM storage, delivering high‑throughput, personalized product feeds.

Xianyu Technology

Nov 6, 2018

Technical Evolution of Xianyu Real-Time Selection System for Double Eleven

Double Eleven is a massive shopping festival in China. For Alibaba engineers it is an annual stress test that drives a series of innovations across networking, infrastructure, capacity planning, performance optimization, personalized recommendation, intelligent search and complex marketing scenarios.

The Xianyu selection system ("选品投放系统") is introduced to address two key characteristics of Double Eleven event pages: similar product categories (e.g., sports goods) and personalized, per‑user product feeds. The system must select relevant items from billions of products in real time and deliver them to each user.

2. Xianyu Selection System Evolution

2.1 Business Problems

The e‑commerce flow consists of three core stages: product publishing, product display, and product transaction. The selection system focuses on the display stage, especially during large‑scale events, by combining operational rules with algorithmic personalization to build many event channels and present high‑quality items.

2.2 Search‑Based Implementation

Early Xianyu relied on a search engine (Solr) as the data source for feeds. The pipeline:

Publish → write to product DB → send async message via RocketMQ.

Consumer reads the message, enriches data via RPC, writes to a MySQL dump table, then updates the search index via HTTP.

Nightly full‑dump jobs rebuild the index.

Operators configure selection rules through a UI that translates to search queries.

Front‑end feeds request are generated by converting rule IDs to search queries.

Limitations include high latency for real‑time updates (≈100 ms) and scalability concerns for large‑scale events.

2.3 Offline Computation Approach

Using a big‑data platform (MaxComputer, or a custom Kafka + Hadoop + Hive + Spark stack) to run batch jobs that compute selection rules. This approach cannot meet Xianyu’s sub‑second real‑time requirements.

2.4 PostgreSQL Attempt

PostgreSQL was evaluated for its OLTP/OLAP capabilities, fast JSON handling, triggers, and NOTIFY mechanism. However, trigger overhead, notification reliability, and KV store performance issues prevented it from being a viable production solution.

2.5 Real‑Time Computing Platform (Blink)

Blink, an Alibaba‑enhanced Flink platform, was chosen for its real‑time stream processing, high‑performance state backend, and operational maturity. Key challenges addressed:

Incremental and full‑load data merging in a wide table.

High write/read throughput for billions of rows (≈2 × 10⁵ TPS).

Efficient rule evaluation and result set updates.

Blink’s state backend options include:

Gemini – high‑performance in‑memory backend.

RocksDBStateBackend – Flink native, slower.

Niagara – Alibaba’s custom distributed backend based on Seastar, offering low‑latency LSM‑tree storage.

Custom UDAFs are used for aggregation. Example code:

/*
 * @param <T>  UDAF output type
 * @param <ACC> UDAF accumulator type
 */
public abstract class AggregateFunction<T, ACC> extends UserDefinedFunction {
    public ACC createAccumulator();
    public T getValue(ACC accumulator);
    public void accumulate(ACC accumulator, ...[user inputs]...);
}

The state stores fields such as _mergeRes (merged product info with timestamps), _ruleRes (rule hit flag), _changeFlag (change marker), and _ruleDiff (diff for incremental updates). Niagara’s LSM design, masstree memtable, and adaptive compaction enable the required throughput.

2.6 Real‑Time Full Selection and Preview

HybridDB for MySQL (an OLAP database) provides trillion‑scale, millisecond‑level multi‑dimensional analysis, integrated with MaxComputer and Blink for seamless rule evaluation and KV store version switching.

2.7 Real‑Time Recommendation Layer

Details are omitted for brevity; the recommendation layer builds on the real‑time selection results.

2.8 Performance Metrics

The system, named “Mach” (after the Mach speed unit), now powers almost all Xianyu feed‑type services, supporting high QPS, low latency, and robust reliability.

References

[1] RocketMQ: https://rocketmq.apache.org/

[2] Real‑time search solution based on Lucene.

[3] MaxComputer: https://www.aliyun.com/product/odps

[4] PostgreSQL NOTIFY: https://www.postgresql.org/docs/current/static/sql-notify.html

[5] Seastar: http://www.seastar-project.org

[6] Masstree: https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf

[7] HybridDB for MySQL: https://www.aliyun.com/product/petadata

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Alibaba Big Data Flink Xianyu Real-time Streaming blink selection system

Written by

Xianyu Technology

Official account of the Xianyu technology team

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.