Databases 15 min read

Cassandra Multi‑Data‑Center Fault Tolerance Experiment and Analysis

This article presents a step‑by‑step experiment on a Cassandra cluster spanning two data centers, demonstrating how token ownership, data distribution, and fault‑tolerance behave when nodes fail or are removed, and explains the observed owns percentages and replication effects.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Cassandra Multi‑Data‑Center Fault Tolerance Experiment and Analysis

1. Background

The article investigates Cassandra's partition fault tolerance and strong consistency by examining what happens to data replicas when the host machine fails, especially in a multi‑data‑center deployment.

2. Experiment Environment

Cluster topology (cross‑data‑center):

Data Center

Node IPs

Seed Nodes

DC1

10.186.60.61, 10.186.60.7, 10.186.60.118, 10.186.60.67

10.186.60.61, 10.186.60.7

DC2

10.186.60.53, 10.186.60.65, 10.186.60.94, 10.186.60.68

10.186.60.53, 10.186.60.65

Initial nodetool status output shows that the total owns value across the whole cluster stays at 200 %, while each data center’s owns is not 100 % because tokens are shared across DCs.

3. Specific Experiments

3.1 Experiment 1 – Four Replicas per DC

Create a keyspace with NetworkTopologyStrategy (4 replicas in each DC) and a simple table:

[cassandra@data01 ~]$ cqlsh 10.186.60.61 -u cassandra -p cassandra
CREATE KEYSPACE "dcdatabase" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1' : 4, 'dc2' : 4};
use dcdatabase;
create table test (id int, user_name varchar, primary key (id) );
insert into test (id,name) VALUES (1,'test1');
insert into test (id,name) VALUES (2,'test2');
insert into test (id,name) VALUES (3,'test3');
insert into test (id,name) VALUES (4,'test4');
insert into test (id,name) VALUES (5,'test5');

After inserting data, nodetool status shows each DC’s owns at 400 % (four replicas). The data is distributed across all nodes in both DCs.

[cassandra@data01 ~]$ nodetool getendpoints dcdatabase test 1
10.186.60.7
10.186.60.94
10.186.60.65
10.186.60.118
10.186.60.67
10.186.60.61
10.186.60.53
10.186.60.68

When node 94 is stopped ( systemctl stop cassandra ), its owns value remains unchanged, but the data still appears on the failed node according to nodetool getendpoints . After removing the dead node with nodetool removenode , the data no longer references node 94, confirming that Cassandra can continue operating with the node removed.

[cassandra@data02 ~]$ nodetool removenode c8fa86e4-ee9a-4c62-b00b-d15edc967b9f

Post‑removal nodetool status shows only the remaining nodes, each still reporting 100 % owns within their DC.

3.2 Experiment 2 – Three Replicas per DC

Re‑create the keyspace with three replicas per DC:

CREATE KEYSPACE "dcdatabase" WITH REPLICATION = {'class':'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 3};

Now nodetool status reports ~300 % owns for each DC, matching the three‑replica configuration. After stopping node 94 and removing it, the data is redistributed: DC1 still holds data on three nodes, while DC2’s data is now limited to its three surviving nodes, demonstrating limited fault‑tolerance when replication factor is lower.

Finally, restarting all hosts brings the previously failed node back into the cluster, and nodetool status shows the owns values returning to the expected percentages (≈73‑78 % per node) as the cluster re‑balances.

Conclusion

The experiments confirm that Cassandra continues to serve reads and writes after a node is stopped or removed, but data placement and ownership percentages depend on the replication factor and the number of remaining nodes in each data center.

distributed systemsfault tolerancedata replicationNoSQLCassandra
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.