Databases 10 min read

Primary Key and Index Design for Distributed Databases

The article explains why auto‑increment primary keys are unsuitable for sharded databases, recommends using globally unique ordered identifiers such as UUIDs or Snowflake IDs, and discusses index design strategies—including index tables, embedding shard keys, and handling global tables—to ensure efficient single‑shard queries.

Top Architect
Top Architect
Top Architect
Primary Key and Index Design for Distributed Databases

In a distributed database each primary key must be globally unique across all shards. Using an auto‑increment column fails this requirement because the value is generated only after insertion and can collide on different shards.

For example, an orders table with O_ORDERKEY defined as INT NOT NULL AUTO_INCREMENT may produce duplicate keys in separate shards, leading to incorrect data routing.

CREATE TABLE `orders` (
  `O_ORDERKEY` int NOT NULL auto_increment,
  `O_CUSTKEY` int NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` int NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`),
  KEY (`O_CUSTKEY`)
) ENGINE=InnoDB;

Therefore, the article advises avoiding auto‑increment keys in sharded environments and instead using ordered globally unique identifiers such as MySQL's sequential UUID, business‑generated IDs, or algorithms like Snowflake (with caution about time‑rollback).

When queries need to locate a row by a non‑shard key, two main designs are proposed:

Make the shard key the same as the query key (e.g., use o_orderkey as the shard key).

Create an auxiliary index table that stores the primary key together with the shard key, enabling a two‑step lookup that routes the second query to a single shard.

CREATE TABLE idx_orderkey_custkey (
  o_orderkey INT,
  o_custkey INT,
  PRIMARY KEY (o_orderkey)
);

A two‑step query first retrieves the shard key from the index table and then fetches the row from the appropriate shard, guaranteeing that only one shard is accessed regardless of the total number of shards.

Another efficient approach is to embed the shard key directly into the primary key value (e.g., concatenate o_orderkey and o_custkey ), allowing the system to infer the target shard from the primary key alone.

o_orderkey = string(o_orderkey + o_custkey)

For small, rarely‑updated tables that lack a natural shard key (global tables), the article suggests replicating the table on every shard so that queries never need to cross shards.

Unique indexes face the same issue as primary keys: a locally unique constraint does not guarantee global uniqueness. The solution is to use globally unique values (UUIDs) for unique indexes as well.

In summary, the key recommendations are:

Use ordered, globally unique identifiers for primary keys.

Design unique indexes with the same global uniqueness principle.

If a unique index is not a shard key, store shard information alongside it to enable single‑shard routing.

Replicate tiny global tables on every shard to avoid cross‑shard scans.

index designShardingUUIDdistributed databasesprimary keyglobal tables
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.