XSQL: A Low‑Barrier, Stable Multi‑Data‑Source Distributed Query Engine
XSQL is an open‑source, low‑threshold, highly stable distributed query engine that supports federated queries across heterogeneous data sources, offering push‑down optimization, metadata decentralization, multi‑engine integration, and seamless deployment on Spark/YARN for real‑time big‑data analytics.
Background – The rapid growth of big‑data storage and compute frameworks has created high learning curves, low maintainability, and unstable distributed task execution, especially when dealing with heterogeneous data sources and complex deployment environments.
XSQL Overview – Developed by the 360 Unified Computing team, XSQL is a low‑threshold, easy‑to‑deploy, and more stable multi‑data‑source distributed query engine. It leverages SQL as a universal interface, supports push‑down execution, and provides federated queries across nine data sources including Hive, MySQL, Elasticsearch, MongoDB, Kafka, HBase, Redis, and Druid.
Key Features
Low Threshold : Users interact via standard SQL, reducing learning effort for diverse data sources.
Stability : Handles environment and data‑related instability with plugins for monitoring, adaptive broadcast, partition merging, and cost‑based optimization (CBO).
Efficiency : Eliminates redundant computation, saves resources, and benefits from Spark‑SQL‑compatible push‑down optimization.
Data Federation : Enables cross‑source joins (e.g., Elasticsearch ↔ MongoDB) without data migration.
Delayed Yarn Interaction : Registers applications and requests resources only when necessary, reducing cluster waste.
Single‑Pass SQL Parsing : Replaces Spark SQL engine entirely, offering full compatibility without extra dependencies.
Push‑Down Optimization : Executes queries close to the data source, achieving orders‑of‑magnitude speedups.
Metadata Decentralization : Caches and fetches metadata in real time, avoiding a single point of failure.
Cache Levels & Whitelists : Two‑tier metadata cache and user‑defined whitelist minimize source load.
Multi‑Engine Exploration : Early support for Flink 1.9.0 and Presto 317 (experimental).
Batch‑and‑Stream Unification : One SQL can drive both batch and streaming jobs, currently supporting Kafka for streaming.
Technical Architecture
Deployment : Follows Hadoop standards, reuses Spark‑on‑YARN deployment, and adopts metadata decentralization.
Implementation : Extends Spark’s SQL framework, replaces its core, and provides multi‑language APIs.
Future : Plans to integrate additional engines such as Flink and Presto.
Typical Scenarios – From newcomers needing simple Elasticsearch queries to seasoned engineers seeking to replace repetitive Spark jobs, XSQL simplifies heterogeneous data access, reduces client installations, and improves resource utilization.
Compilation & Deployment Guide
Compilation environment: JDK 1.8+, Hadoop 2.7.2+, Spark 2.4.x.
Steps:
git clone https://github.com/Qihoo360/XSQLBuild as Spark plugin:
./XSQL/build-plugin.shOr build a full binary (with embedded Spark):
./XSQL/build.shDeploy by extracting the tarball and configuring data sources, e.g.:
spark.xsql.datasource.default.type mysql
spark.xsql.datasource.default.url jdbc:mysql://127.0.0.1:2336
spark.xsql.datasource.default.user real_username
spark.xsql.datasource.default.password real_passwordRun examples via the command‑line tool:
$SPARK_HOME/bin/spark-xsql
spark-xsql> show datasources;Or via Scala API:
val spark = SparkSession.builder().enableXSQLSupport().getOrCreate()
spark.sql("show datasources")Conclusion – XSQL aims to provide a robust middleware for the open‑source community, inviting users and developers to contribute.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.