Big Data 10 min read

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

This article explains how Alibaba's Jushita platform leverages Apache Solr with a wide‑table data model and a custom QParser plugin to achieve real‑time, multi‑dimensional buyer filtering that traditional relational databases cannot handle efficiently in big‑data scenarios.

Qunar Tech Salon

Feb 14, 2016

Accelerating Real‑Time Data Queries with Solr in Alibaba's Jushita Platform

In the era of digital transformation, data is the most valuable asset for platforms and merchants, but traditional relational databases struggle to meet the massive volume, high dimensionality, and real‑time requirements of big‑data queries.

While Hadoop, Hive, and Spark are widely used for offline processing, they cannot provide the low‑latency responses needed for interactive filtering; Solr is introduced as a complementary solution to enable fast, multi‑dimensional searches.

The article presents a concrete use case from Alibaba's Jushita environment, where large sellers need to filter buyers in real time based on purchase quantity or amount within arbitrary time windows, a task that ordinary SQL queries cannot satisfy efficiently.

Because adding indexes or complex joins in a relational database would cause heavy I/O and maintenance overhead, the solution shifts the query workload to Solr, which relies on inverted indexes that are orders of magnitude faster than B‑tree structures.

Data preparation involves building a wide‑table that aggregates one‑to‑many relationships (e.g., buyer and trade records) into a single document. The wide‑table is generated with Hive (or ODPS in Alibaba Cloud) for full‑load processing, and incremental updates are periodically re‑indexed to avoid fragmenting the Solr index.

The wide‑table contains three main parts: a Buyer table, a Trade table, and an aggregated structure where fields such as sellerId_date_buyerId_payment_payCount encode seller ID, date, buyer ID, total payment, and purchase count for each day.

On the Solr side, a custom QParser plugin is implemented to parse query parameters, iterate over the relevant term ranges, and collect matching document IDs. The core logic extracts terms between prefixStart and prefixEnd, evaluates the filter conditions, and accumulates results in a buyerStatis collector.

To deploy the plugin, the following configuration is added to solrconfig.xml:

<queryParser name="timesegstats" class="com.xxx.qp.TimeSegStatsQParserPlugin">
  <str name="buyerField">buyer_id</str>
  <str name="compoundField">dynamic_info</str>
  <str name="countField">emailSendCount</str>
  <str name="statsFields"></str>
</queryParser>

With the plugin in place, Solr can execute real‑time queries that filter buyers by purchase count or amount over specified time segments, returning results instantly without the latency of traditional database joins.

In summary, the case demonstrates how integrating Solr with a pre‑aggregated wide‑table and a custom query parser can overcome relational database bottlenecks, providing scalable, low‑latency, and flexible query capabilities for big‑data applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data search engine data indexing Solr Real-time Query

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.