Databases 20 min read

Choosing Between NewSQL Databases and Middleware‑Based Sharding: A Comparative Analysis

This article objectively compares NewSQL distributed databases with traditional middleware‑based sharding solutions, examining their architectures, distributed transaction support, high availability, scaling, storage engines, SQL capabilities, maturity, and suitability for various workloads, to help architects decide which approach best fits their needs.

Java Architect Essentials

May 19, 2024

Choosing Between NewSQL Databases and Middleware‑Based Sharding: A Comparative Analysis

Recently I have been asked many times whether to choose sharding with middleware and traditional relational databases or to adopt a NewSQL distributed database. This article aims to objectively compare the two approaches by analyzing their key characteristics, implementation principles, advantages, disadvantages, and suitable scenarios.

What makes NewSQL databases advanced?

According to the classification in the SIGMOD paper "pavlo‑newsql", Spanner, TiDB, and OceanBase belong to the first new‑architecture type, while middleware solutions such as Sharding‑Sphere, Mycat, and DRDS belong to the second type.

Middleware + relational‑DB sharding can be considered a distributed architecture because storage is also distributed and horizontal scaling is possible, but it often repeats SQL parsing and execution‑plan generation, making it less efficient.

NewSQL databases improve on this by eliminating redundant parsing, using more efficient storage designs, and providing built‑in distributed transaction support.

Below is a simple architectural comparison (image omitted).

Traditional databases are disk‑oriented; NewSQL databases manage memory more efficiently.

Middleware repeats SQL parsing and optimization, leading to lower efficiency.

NewSQL distributed transactions are optimized compared to XA, offering higher performance.

NewSQL stores data using Paxos or Raft multi‑replica protocols, achieving true high availability (RTO < 30 s, RPO = 0).

NewSQL automatically handles data sharding, migration, and scaling without application changes.

These points are often highlighted by NewSQL vendors, but are they truly as beneficial as claimed? The following sections discuss each aspect in detail.

Distributed Transactions

This is a double‑edged sword.

CAP limitation

NoSQL databases historically avoided distributed transactions due to the CAP theorem. NewSQL databases, such as Google Spanner, claim to be "practically CA" by operating on a private global network that minimizes partition events.

Recommended reading: "In distributed systems you can know where work is done or when it finishes, but not both simultaneously; two‑phase commit is essentially an anti‑availability protocol."

Completeness

Two‑phase commit (2PC) does not guarantee strict ACID under all failure scenarios; recovery mechanisms are needed to ensure eventual consistency.

Many NewSQL products still have incomplete distributed‑transaction support; real‑world cases sometimes fail.

Performance

Traditional relational databases support XA, but its high network overhead and blocking make it unsuitable for high‑throughput OLTP. NewSQL implementations often use Percolator‑style transactions with a Timestamp Oracle, MVCC, and snapshot isolation, reducing lock contention and improving performance compared to XA.

SI is optimistic locking; in hotspot scenarios it may cause many aborts, and its isolation level differs from REPEATABLE READ.

Nevertheless, 2PC still incurs extra network round‑trips, GID acquisition, and log persistence, which can be a bottleneck in large‑scale scenarios such as batch banking transactions.

Given the performance cost, many NewSQL vendors recommend minimizing distributed transactions at the application level.

HA and Multi‑Region Active‑Active

Traditional master‑slave replication (even semi‑synchronous) can lose data under failure. Modern solutions use Paxos or Raft multi‑replica protocols (e.g., Spanner, TiDB, OceanBase) to achieve high reliability and fast failover.

These protocols can also be applied to traditional databases; MySQL Group Cluster is an example.

Implementing production‑grade consensus algorithms requires careful engineering, batching, and asynchronous optimizations.

While Paxos/Raft enable multi‑region active‑active setups, they demand low network latency; high‑latency links (tens of ms) make true active‑active OLTP impractical.

Some teams (e.g., Ant Group) use application‑level dual‑write with distributed caches to achieve eventual consistency across regions.

Horizontal Scaling and Sharding

NewSQL databases embed automatic sharding, hotspot detection, and region splitting (e.g., TiDB splits a region at 64 MiB). Middleware‑based sharding requires explicit design of split keys, routing rules, and manual scaling, increasing complexity.

However, built‑in sharding strategies may not align with domain models, leading to distributed transactions for certain business patterns (e.g., banking customers).

Distributed SQL Support

Both approaches support single‑shard SQL, but NewSQL offers richer cross‑shard capabilities (joins, aggregations) thanks to global statistics and cost‑based optimization (CBO). Middleware often relies on rule‑based optimization (RBO) and may lack cross‑shard support.

NewSQL typically supports MySQL or PostgreSQL protocols, limiting the SQL dialects, while middleware can route to multiple underlying databases.

Storage Engine

Traditional engines use B‑Tree structures optimized for disk reads; NewSQL often adopts LSM trees, converting random writes into sequential writes for higher write throughput, at the cost of slightly slower reads.

Additional optimizations (SSD, bloom filters, caching) mitigate read penalties.

Maturity and Ecosystem

NewSQL is still evolving, with strong adoption in internet companies but cautious use in regulated industries. Traditional relational databases have decades of stability, extensive tooling, and a larger talent pool.

For fast‑growing internet services, NewSQL’s automatic scaling and reduced operational overhead are attractive; for conservative enterprises, middleware‑based sharding remains a lower‑risk choice.

Conclusion

When deciding, consider questions such as the necessity of strong‑consistent transactions, data growth predictability, scaling frequency, throughput vs latency priorities, application transparency, and DBA expertise.

If strong consistency, rapid scaling, and high throughput are critical, NewSQL may be worth the learning curve.

If you prefer a proven, lower‑risk solution with existing DBA skills, middleware‑based sharding is often sufficient.

Both paths have trade‑offs; there is no perfect solution.

To the reader: feel free to discuss these viewpoints; they reflect personal experience and industry observations.

到此文章就结束了。 Java架构师必看一个集公众号、小程序、网站(3合1的文章平台，给您架构路上一臂之力)。欢迎转发并加入架构师社区技术交流群。

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Scalability Sharding Database Architecture NewSQL distributed transactions

Written by

Java Architect Essentials

Committed to sharing quality articles and tutorials to help Java programmers progress from junior to mid-level to senior architect. We curate high-quality learning resources, interview questions, videos, and projects from across the internet to help you systematically improve your Java architecture skills. Follow and reply '1024' to get Java programming resources. Learn together, grow together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.