Big Data 10 min read

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

This article explains how Alibaba's Jushita platform leverages Apache Solr with a wide‑table data model and a custom QParser plugin to achieve real‑time, multi‑dimensional buyer filtering that traditional relational databases cannot handle efficiently in big‑data scenarios.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

In the era of digital transformation, data is the most valuable asset for platforms and merchants, but traditional relational databases struggle to meet the massive volume, high dimensionality, and real‑time requirements of big‑data queries.

While Hadoop, Hive, and Spark are widely used for offline processing, they cannot provide the low‑latency responses needed for interactive filtering; Solr is introduced as a complementary solution to enable fast, multi‑dimensional searches.

The article presents a concrete use case from Alibaba's Jushita environment, where large sellers need to filter buyers in real time based on purchase quantity or amount within arbitrary time windows, a task that ordinary SQL queries cannot satisfy efficiently.

Because adding indexes or complex joins in a relational database would cause heavy I/O and maintenance overhead, the solution shifts the query workload to Solr, which relies on inverted indexes that are orders of magnitude faster than B‑tree structures.

Data preparation involves building a wide‑table that aggregates one‑to‑many relationships (e.g., buyer and trade records) into a single document. The wide‑table is generated with Hive (or ODPS in Alibaba Cloud) for full‑load processing, and incremental updates are periodically re‑indexed to avoid fragmenting the Solr index.

The wide‑table contains three main parts: a Buyer table, a Trade table, and an aggregated structure where fields such as sellerId_date_buyerId_payment_payCount encode seller ID, date, buyer ID, total payment, and purchase count for each day.

On the Solr side, a custom QParser plugin is implemented to parse query parameters, iterate over the relevant term ranges, and collect matching document IDs. The core logic extracts terms between prefixStart and prefixEnd , evaluates the filter conditions, and accumulates results in a buyerStatis collector.

To deploy the plugin, the following configuration is added to solrconfig.xml :

buyer_id
dynamic_info
emailSendCount

With the plugin in place, Solr can execute real‑time queries that filter buyers by purchase count or amount over specified time segments, returning results instantly without the latency of traditional database joins.

In summary, the case demonstrates how integrating Solr with a pre‑aggregated wide‑table and a custom query parser can overcome relational database bottlenecks, providing scalable, low‑latency, and flexible query capabilities for big‑data applications.

search enginedata indexingsolrReal-time Query
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.