Databases 30 min read

Sharding-JDBC Introduction and Practical Guide to Database Sharding with ShardingSphere

This article introduces Sharding-JDBC (now ShardingSphere), explains core sharding concepts such as shards, data nodes, logical and physical tables, sharding keys, algorithms and strategies, shows how it extends JDBC, and provides a step‑by‑step Spring Boot + MyBatis‑Plus example for building a sharded database application.

Code Ape Tech Column

Sep 9, 2021

Sharding-JDBC Introduction and Practical Guide to Database Sharding with ShardingSphere

As the opening article of the Sharding-JDBC sharding practice series, we first review basic sharding knowledge and then introduce the Sharding-JDBC framework and quickly set up a sharding example to prepare the environment for later feature demonstrations.

1. Sharding-JDBC Overview

Sharding-JDBC originated as an internal sharding framework at Dangdang and was open‑sourced in 2017. It later evolved into ShardingSphere and became a top‑level Apache project in April 2020. Over successive versions, ShardingSphere added database governance, distributed transactions (Atomikos, Narayana, Bitronix, Seata), and now reaches version 4.0.

ShardingSphere is an ecosystem consisting of three open‑source distributed database middleware components: Sharding-JDBC, Sharding-Proxy, and Sharding‑Sidecar. Sharding-JDBC is the classic, mature component and serves as the entry point for learning sharding.

2. Core Concepts

Before implementing sharding with Sharding-JDBC, it is essential to understand several core concepts.

Shard

Horizontal sharding splits a large table (e.g., t_order) into multiple identical smaller tables ( t_order_0, t_order_1, … t_order_n). When a SQL statement is executed, a sharding strategy determines which database and table the data should be routed to.

Data Node

A data node is the smallest indivisible unit in sharding, consisting of a data source name and a physical table (e.g., order_db_1.t_order_0).

Logical Table

A logical table represents a group of tables with the same structure. The logical name t_order maps to many physical tables ( t_order_0 … t_order_n), while application code still uses t_order in SQL.

Physical Table

The actual tables that exist in the database, such as t_order_n.

Sharding Key

The column used for sharding, e.g., order_id in t_order. The value of the sharding key determines the target database and table.

Sharding Algorithm

Simple modulo sharding is just one option. Sharding‑JDBC also supports comparison operators ( >=, <=, BETWEEN, IN) and requires a sharding strategy combined with a sharding algorithm.

Sharding strategy abstracts the concept; it is composed of a sharding algorithm and sharding key. The algorithm implements the concrete routing logic.

Sharding strategy configuration is independent; different strategies and algorithms can be mixed, and a strategy may contain multiple algorithms.

Sharding‑JDBC provides four built‑in sharding algorithms:

1. Precise Sharding Algorithm

Used for equality ( =) and IN conditions with a single sharding key, typically under a StandardShardingStrategy.

2. Range Sharding Algorithm

Handles range conditions such as BETWEEN, >, <, etc., also under StandardShardingStrategy.

3. Complex Keys Sharding Algorithm

Supports multiple sharding keys and is used with ComplexShardingStrategy.

4. Hint Sharding Algorithm

Allows manual routing without extracting a sharding key from the SQL, useful for forced routing scenarios.

Sharding Strategy

There are several strategies:

Standard Sharding Strategy

Recommended for most cases; works when the SQL contains the sharding key (equality or range).

Complex Sharding Strategy

Supports multiple sharding keys and custom logic.

Inline (Row‑Expression) Sharding Strategy

Defines sharding directly with an expression, e.g., ds-${order_id % 2}.

Hint Sharding Strategy

Routes based on hints supplied by the application rather than parsing the SQL.

Distributed Primary Key

After sharding, auto‑increment keys can collide across physical tables. ShardingSphere provides built‑in UUID and Snowflake generators (default is Snowflake) and allows custom key generators.

Broadcast Table

A table that exists in every shard (e.g., dictionary tables). Updating one instance propagates the change to all shards.

Binding Table

Tables that share the same sharding rule (e.g., t_order and t_order_item) are bound so that joint queries are routed to the same database, avoiding Cartesian‑product queries.

SELECT * FROM t_order_0 o JOIN t_order_item_0 i ON o.order_id=i.order_id;
SELECT * FROM t_order_1 o JOIN t_order_item_1 i ON o.order_id=i.order_id;

3. The JDBC Connection "Trick"

Sharding‑JDBC extends the standard JDBC API while remaining fully compatible. It rewrites DataSource, Connection, Statement, and ResultSet to add sharding capabilities.

The extension follows the Adapter pattern: the original vendor‑specific interfaces are wrapped by Sharding‑JDBC's Wrapper implementation, allowing non‑standard methods to be invoked via reflection.

Execution Flow

The six steps of a sharded SQL execution are:

SQL parsing (lexical + syntactic)

Executor optimization (e.g., handling OR)

SQL routing (determine target data nodes)

SQL rewriting (replace logical table names with physical ones)

SQL execution on the routed data sources

Result merging (order, group, pagination, aggregation)

Each step is illustrated with diagrams and examples in the original article.

4. Quick Practice

A Spring Boot + MyBatis‑Plus project is set up to demonstrate sharding:

Two physical databases ds-0 and ds-1 each contain tables t_order_0 ‑ t_order_2 and t_order_item_0 ‑ t_order_item_2.

Configuration is done via application.properties using ShardingSphere properties (data source definitions, actual data nodes, inline sharding strategies, key generators, binding tables, broadcast tables, and SQL logging).

Insert operations automatically generate Snowflake IDs and distribute rows across the six physical tables.

Broadcast table t_config is replicated to both databases.

Binding tables enable efficient join queries without Cartesian products.

Running the example shows data correctly sharded, broadcast updates applied to all shards, and join queries routed to the appropriate physical tables.

5. Summary

The article provides a concise overview of Sharding‑JDBC (ShardingSphere) fundamentals, demonstrates how to configure sharding rules, key generation, broadcast and binding tables, and validates the setup with a Spring Boot demo. Future articles will dive deeper into the four sharding strategies, custom distributed IDs, distributed transactions, service governance, and data masking.

Project source: GitHub

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Spring Boot ShardingSphere database sharding Mybatis-Plus distributed-id Sharding-JDBC

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.