Big Data 14 min read

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.

360 Tech Engineering

Sep 4, 2019

XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine

Background – The rapid growth of big‑data storage and compute frameworks has created high learning curves, low maintainability, and unstable distributed task execution, especially when dealing with heterogeneous data sources and complex deployment environments.

XSQL Overview – Developed by the 360 Unified Computing team, XSQL is a low‑threshold, easy‑to‑deploy, and more stable multi‑data‑source distributed query engine. It leverages SQL as a universal interface, supports push‑down execution, and provides federated queries across nine data sources including Hive, MySQL, Elasticsearch, MongoDB, Kafka, HBase, Redis, and Druid.

Key Features

Low Threshold : Users interact via standard SQL, reducing learning effort for diverse data sources.

Stability : Handles environment and data‑related instability with plugins for monitoring, adaptive broadcast, partition merging, and cost‑based optimization (CBO).

Efficiency : Eliminates redundant computation, saves resources, and benefits from Spark‑SQL‑compatible push‑down optimization.

Data Federation : Enables cross‑source joins (e.g., Elasticsearch ↔ MongoDB) without data migration.

Delayed Yarn Interaction : Registers applications and requests resources only when necessary, reducing cluster waste.

Single‑Pass SQL Parsing : Replaces Spark SQL engine entirely, offering full compatibility without extra dependencies.

Push‑Down Optimization : Executes queries close to the data source, achieving orders‑of‑magnitude speedups.

Metadata Decentralization : Caches and fetches metadata in real time, avoiding a single point of failure.

Cache Levels & Whitelists : Two‑tier metadata cache and user‑defined whitelist minimize source load.

Multi‑Engine Exploration : Early support for Flink 1.9.0 and Presto 317 (experimental).

Batch‑and‑Stream Unification : One SQL can drive both batch and streaming jobs, currently supporting Kafka for streaming.

Technical Architecture

Deployment : Follows Hadoop standards, reuses Spark‑on‑YARN deployment, and adopts metadata decentralization.

Implementation : Extends Spark’s SQL framework, replaces its core, and provides multi‑language APIs.

Future : Plans to integrate additional engines such as Flink and Presto.

Typical Scenarios – From newcomers needing simple Elasticsearch queries to seasoned engineers seeking to replace repetitive Spark jobs, XSQL simplifies heterogeneous data access, reduces client installations, and improves resource utilization.

Compilation & Deployment Guide

Compilation environment: JDK 1.8+, Hadoop 2.7.2+, Spark 2.4.x.

Steps: git clone https://github.com/Qihoo360/XSQL Build as Spark plugin: ./XSQL/build-plugin.sh Or build a full binary (with embedded Spark): ./XSQL/build.sh Deploy by extracting the tarball and configuring data sources, e.g.:

spark.xsql.datasource.default.type   mysql
spark.xsql.datasource.default.url    jdbc:mysql://127.0.0.1:2336
spark.xsql.datasource.default.user   real_username
spark.xsql.datasource.default.password real_password

Run examples via the command‑line tool:

$SPARK_HOME/bin/spark-xsql
spark-xsql> show datasources;

Or via Scala API:

val spark = SparkSession.builder().enableXSQLSupport().getOrCreate()
spark.sql("show datasources")

Conclusion – XSQL aims to provide a robust middleware for the open‑source community, inviting users and developers to contribute.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Spark Distributed Query SQL Federation XSQL

Written by

360 Tech Engineering

Official tech channel of 360, building the most professional technology aggregation platform for the brand.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.