Databases 23 min read

Design and Implementation of the NewSQL Distributed Database TiDB

This article presents a comprehensive technical overview of TiDB, a NewSQL distributed database, covering its architecture, SQL layer, KV engine, distributed transaction mechanisms, code implementation in Go, open‑source practices, and future roadmap.

Architect
Architect
Architect
Design and Implementation of the NewSQL Distributed Database TiDB

The speaker, Liu Qi (goroutine), founder and CEO of PingCAP, introduces TiDB, an open‑source NewSQL distributed database and the related distributed cache Codis, describing his background in infrastructure at JD.com and Wandoujia.

He explains the motivation behind NewSQL: combining the scalability of NoSQL with the strong consistency and transactional guarantees of traditional relational databases, citing examples such as Google Spanner, F1, FoundationDB, CockroachDB, and TiDB itself.

The article outlines common approaches to scaling relational databases—master‑slave replication, sharding middleware (Cobar, TDDL, Vitess, MyCat, etc.)—and their limitations, especially regarding dynamic scaling and transaction support.

It then contrasts NoSQL solutions (HBase, Cassandra, MongoDB) and discusses why pure NoSQL often lacks expressive SQL interfaces and robust transaction semantics.

Moving to TiDB’s architecture, the author presents a layered view: a SQL layer on top of a distributed KV layer. The SQL layer handles lexical analysis and parsing using Go tools such as cznic/goyacc and cznic/ebnf2y , generating an abstract syntax tree (AST) and a plan tree for query execution.

Example code snippets illustrate the execution flow:

func (s *session) Execute(sql string) ([]rset.Recordset, error) {
    statements, err := Compile(sql)
    var rs []rset.Recordset
    for _, st := range statements {
        r := runStmt(s, st)
        rs = append(rs, r)
    }
    return rs, nil
}

func Compile(src string) ([]stmt.Statement, error) {
    l := parser.NewLexer(src)
    if parser.YYParse(l) != 0 {
        return nil, errors.Trace(l.Errors()[0])
    }
    return l.Stmts(), nil
}

The plan generation process is described step‑by‑step (FROM → WHERE → LOCK → GROUP BY → HAVING → SELECT → DISTINCT → ORDER BY → LIMIT → FINAL), with examples of how a simple SELECT statement is transformed into a series of plan nodes.

TiDB’s KV mapping is explained using a table‑row‑column key format (TableID:RowID:ColumnID). Sample key‑value pairs illustrate how a row is stored and how indexes are represented, including unique and non‑unique index layouts.

The transaction interface is shown, highlighting required operations (Get, Set, Seek, Delete, Commit, Rollback) and the need for ordered KV stores to support scans. Distributed transaction handling follows a two‑phase commit (2PC) model, with discussion of coordinator selection, transaction status tables, MVCC, and conflict resolution strategies.

Implementation details of the storage engine abstraction are provided, noting support for LevelDB, RocksDB, LMDB, BoltDB, and a planned HBase‑based engine. An example of a lightweight LMDB engine implementation (~200 lines) is referenced.

The author shares practical insights on open‑source project management: community building, contribution guidelines, PR handling, and the importance of English documentation for global collaboration.

Future roadmap items include improving SQL compatibility, asynchronous schema changes (referencing Google’s research), developing an HBase engine, eventually building a custom KV layer, multi‑tenant support, and containerization.

The article concludes with a Q&A covering transaction status tables, differences between TiDB and MySQL, roadmap details, distributed transaction support, language choice (Go), and comparisons with other distributed databases.

transactionGoDistributed DatabaseTiDBNewSQLKV StoreSQL Layer
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.