Sharding-JDBC Introduction and Practical Guide to Database Sharding with ShardingSphere
This article introduces Sharding-JDBC (now ShardingSphere), explains core sharding concepts such as shards, data nodes, logical and physical tables, sharding keys, algorithms and strategies, shows how it extends JDBC, and provides a step‑by‑step Spring Boot + MyBatis‑Plus example for building a sharded database application.
As the opening article of the Sharding-JDBC sharding practice series, we first review basic sharding knowledge and then introduce the Sharding-JDBC framework and quickly set up a sharding example to prepare the environment for later feature demonstrations.
1. Sharding-JDBC Overview
Sharding-JDBC originated as an internal sharding framework at Dangdang and was open‑sourced in 2017. It later evolved into ShardingSphere and became a top‑level Apache project in April 2020. Over successive versions, ShardingSphere added database governance, distributed transactions (Atomikos, Narayana, Bitronix, Seata), and now reaches version 4.0.
ShardingSphere is an ecosystem consisting of three open‑source distributed database middleware components: Sharding-JDBC, Sharding-Proxy, and Sharding‑Sidecar. Sharding-JDBC is the classic, mature component and serves as the entry point for learning sharding.
2. Core Concepts
Before implementing sharding with Sharding-JDBC, it is essential to understand several core concepts.
Shard
Horizontal sharding splits a large table (e.g., t_order ) into multiple identical smaller tables ( t_order_0 , t_order_1 , … t_order_n ). When a SQL statement is executed, a sharding strategy determines which database and table the data should be routed to.
Data Node
A data node is the smallest indivisible unit in sharding, consisting of a data source name and a physical table (e.g., order_db_1.t_order_0 ).
Logical Table
A logical table represents a group of tables with the same structure. The logical name t_order maps to many physical tables ( t_order_0 … t_order_n ), while application code still uses t_order in SQL.
Physical Table
The actual tables that exist in the database, such as t_order_n .
Sharding Key
The column used for sharding, e.g., order_id in t_order . The value of the sharding key determines the target database and table.
Sharding Algorithm
Simple modulo sharding is just one option. Sharding‑JDBC also supports comparison operators ( >= , <= , BETWEEN , IN ) and requires a sharding strategy combined with a sharding algorithm.
Sharding strategy abstracts the concept; it is composed of a sharding algorithm and sharding key. The algorithm implements the concrete routing logic.
Sharding strategy configuration is independent; different strategies and algorithms can be mixed, and a strategy may contain multiple algorithms.
Sharding‑JDBC provides four built‑in sharding algorithms:
1. Precise Sharding Algorithm
Used for equality ( = ) and IN conditions with a single sharding key, typically under a StandardShardingStrategy .
2. Range Sharding Algorithm
Handles range conditions such as BETWEEN , > , < , etc., also under StandardShardingStrategy .
3. Complex Keys Sharding Algorithm
Supports multiple sharding keys and is used with ComplexShardingStrategy .
4. Hint Sharding Algorithm
Allows manual routing without extracting a sharding key from the SQL, useful for forced routing scenarios.
Sharding Strategy
There are several strategies:
Standard Sharding Strategy
Recommended for most cases; works when the SQL contains the sharding key (equality or range).
Complex Sharding Strategy
Supports multiple sharding keys and custom logic.
Inline (Row‑Expression) Sharding Strategy
Defines sharding directly with an expression, e.g., ds-${order_id % 2} .
Hint Sharding Strategy
Routes based on hints supplied by the application rather than parsing the SQL.
Distributed Primary Key
After sharding, auto‑increment keys can collide across physical tables. ShardingSphere provides built‑in UUID and Snowflake generators (default is Snowflake) and allows custom key generators.
Broadcast Table
A table that exists in every shard (e.g., dictionary tables). Updating one instance propagates the change to all shards.
Binding Table
Tables that share the same sharding rule (e.g., t_order and t_order_item ) are bound so that joint queries are routed to the same database, avoiding Cartesian‑product queries.
SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id;
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id;3. The JDBC Connection "Trick"
Sharding‑JDBC extends the standard JDBC API while remaining fully compatible. It rewrites DataSource , Connection , Statement , and ResultSet to add sharding capabilities.
The extension follows the Adapter pattern: the original vendor‑specific interfaces are wrapped by Sharding‑JDBC's Wrapper implementation, allowing non‑standard methods to be invoked via reflection.
Execution Flow
The six steps of a sharded SQL execution are:
SQL parsing (lexical + syntactic)
Executor optimization (e.g., handling OR )
SQL routing (determine target data nodes)
SQL rewriting (replace logical table names with physical ones)
SQL execution on the routed data sources
Result merging (order, group, pagination, aggregation)
Each step is illustrated with diagrams and examples in the original article.
4. Quick Practice
A Spring Boot + MyBatis‑Plus project is set up to demonstrate sharding:
Two physical databases ds-0 and ds-1 each contain tables t_order_0 ‑ t_order_2 and t_order_item_0 ‑ t_order_item_2 .
Configuration is done via application.properties using ShardingSphere properties (data source definitions, actual data nodes, inline sharding strategies, key generators, binding tables, broadcast tables, and SQL logging).
Insert operations automatically generate Snowflake IDs and distribute rows across the six physical tables.
Broadcast table t_config is replicated to both databases.
Binding tables enable efficient join queries without Cartesian products.
Running the example shows data correctly sharded, broadcast updates applied to all shards, and join queries routed to the appropriate physical tables.
5. Summary
The article provides a concise overview of Sharding‑JDBC (ShardingSphere) fundamentals, demonstrates how to configure sharding rules, key generation, broadcast and binding tables, and validates the setup with a Spring Boot demo. Future articles will dive deeper into the four sharding strategies, custom distributed IDs, distributed transactions, service governance, and data masking.
Project source: GitHub
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.