Databases 17 min read

Deployment, Optimization, and Management of TiDB Service in 360 Zhihui Cloud

This article details the product models, usage scenarios, and a series of performance and operational optimizations—including query plan health checks, space reclamation, resource isolation, cloud‑native deployment, cross‑region high availability, and unified monitoring—implemented for the TiDB service operated by 360 Zhihui Cloud since its launch in April 2023.

360 Smart Cloud
360 Smart Cloud
360 Smart Cloud
Deployment, Optimization, and Management of TiDB Service in 360 Zhihui Cloud

Since its official launch in April 2023, the 360 Zhihui Cloud middleware team has been continuously exploring and optimizing TiDB service, achieving significant results from open‑source projects to stable production services. This article introduces the product forms, application scenarios, and optimization measures of the Zhihui Cloud TiDB service.

1. Product Forms

The TiDB service provided by the Zhihui Cloud middleware team aims to offer a high‑availability, massive‑storage, strongly consistent, easy‑maintenance, and analytically powerful MySQL‑compatible database for the entire group. It solves sharding difficulties caused by large data volumes, avoids massive architectural changes, and meets the urgent need for analytical data warehouses, while leveraging TiDB’s built‑in HA to compensate for the lack of HA in local disks used by K8s cloud‑native deployments.

Three business shapes are offered:

Dedicated : for large‑scale exclusive workloads, supporting automatic scaling.

Shared : for small workloads, ensuring resource isolation.

Cloud‑Native : built on K8s and TiDB‑Operator for rapid delivery.

2. Dedicated Type

The dedicated TiDB cluster serves large businesses with exclusive resources and auto‑scaling capabilities.

Key components include:

LoadBalancer for even traffic distribution and node failure tolerance.

DM for MySQL‑to‑TiDB migration.

TiCDC for cluster‑level HA and TiDB‑to‑MySQL failover.

BR for full and incremental backups via S3.

V‑Metrics and Grafana for unified monitoring and alert convergence.

Deployment set to avoid placing replicas on the same physical host.

Physical machines hosting the TiDB clusters.

2.1 TiDB Query Optimization

To ensure stable query execution, a health‑check mechanism is built. Tables with health below 95% are analyzed to prevent large latency spikes, OOM, or task kills, while limiting concurrent analyze threads for service stability.

#### Collect table health for online tables and store in metadata table ####
show stats_healthy where db_name='$dbname' and table_name='$table_name';

#### Analyze tables with health < 95% ####
analyze table '$table_name';

2.2 TiDB Space Reclamation Optimization

Frequent large‑scale deletes caused space not to be released because RocksDB only marks records as deleted and waits for compaction. By increasing max-merge-region-keys from 200 000 to 500 000 and max-merge-region-size from 20 MiB to 100 MiB, regions merge earlier, freeing space.

After the adjustment, the primary and standby clusters have comparable data sizes, and query/delete latency dropped dramatically.

3. Shared Type

The shared model uses resource control to let multiple small‑scale users share a TiDB cluster while preserving isolation. Versions are no lower than 7.5.

Workload testing in mixed read/write mode establishes a stable QPS‑to‑RU conversion (approximately 4:1), enabling reliable RU budgeting.

Test Group

Allocated RU

RU Threshold Met

Average QPS

rg1

10000

Yes

2484.51

rg2

7000

Yes

1745.59

rg3

4000

Yes

994.96

4. Cloud‑Native

The cloud‑native shape builds TiDB on Kubernetes using TiDB‑Operator, providing flexible, on‑demand database services and abstracting underlying infrastructure management.

4.1 TiDB Persistent Volumes

Four PV options were evaluated; OpenEBS was chosen for its dynamic local‑PV capability, simplicity, and cost‑effectiveness.

---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: openebs-hostpath
  annotations:
    openebs.io/cas-type: local
    cas.openebs.io/config: |
      #hostpath type will create a PV by creating a sub‑directory under the BASEPATH
      - name: StorageType
        value: "hostpath"
      - name: BasePath
        value: "/data1/"
provisioner: openebs.io/local
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Delete

Example TiDBCluster manifest (basic configuration):

kind: TidbCluster
metadata:
  name: basic
spec:
  version: v7.1.1
  timezone: UTC
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  configUpdateStrategy: RollingUpdate
  pd:
    baseImage: uhub.service.ucloud.cn/pingcap/pd
    maxFailoverCount: 0
    replicas: 2
    storageClassName: openebs-hostpath
    requests:
      storage: "5Gi"
  tikv:
    baseImage: uhub.service.ucloud.cn/pingcap/tikv
    maxFailoverCount: 0
    replicas: 3
    storageClassName: openebs-hostpath
    requests:
      storage: "5Gi"
    config:
      storage:
        reserve-space: "0MB"
      rocksdb:
        max-open-files: 256
      raftdb:
        max-open-files: 256

Running PVC list (example):

# kubectl get pvc | grep openebs
pd-basic-pd-0   Bound   pvc-c11078de-...   5Gi   RWO   openebs-hostpath   91d
pd-basic-pd-1   Bound   pvc-24aa77a6-...   5Gi   RWO   openebs-hostpath   91d
tikv-basic-tikv-0   Bound   pvc-2e150942-...   5Gi   RWO   openebs-hostpath   91d
...

5. Cross‑Region High Availability

Multiple TiDB clusters are deployed in northern and southern regions and synchronized via TiCDC.

5.1 Latency Optimization

To reduce inter‑region latency, the team applied measures such as controlling large upstream transactions, scaling TiCDC, increasing per-table-memory-quota , enabling cross‑node table sync, colocating TiCDC with downstream clusters, and configuring early‑warning alerts.

6. Multi‑Active Across Data Centers

Three‑center same‑city deployment ensures data‑center‑level HA. Location labels zone and host are set, and placement policies direct leaders to the preferred zone.

## PD service configuration ##
pd:
  enable-tcp4-only: true
replication.location-labels:
  - zone
  - host
## TiKV service configuration ##
tikv_servers:
  - host: xxxxxxx
    ssh_port: 22
    port: xxxxx
    status_port: xxxx
    deploy_dir: xxxxxx
    data_dir: xxxxxx
    log_dir: xxxxx
    config:
      server.labels:
        host: xxxx.xx.xxx.xxxx
        zone: pdc

Placement policy example:

Create placement policy pdc_leader_policy leader_constraints="[+zone=pdc]";

These rules reduce average query latency from ~20 ms to ~1 ms.

7. Monitoring and Alerting

Instead of using TiDB’s built‑in monitoring, all metrics are integrated into the Zhihui Cloud DBA monitoring platform, providing a unified dashboard and flexible alert configuration, which reduces resource waste and improves operational efficiency.

8. Summary

The 360 Zhihui Cloud infrastructure team successfully deployed and optimized TiDB service across dedicated, shared, and cloud‑native forms, meeting diverse business needs. Custom optimizations such as query‑plan health checks, space‑reclamation tuning, RU‑QPS management, cross‑region HA, multi‑active data‑center deployment, and unified monitoring have markedly improved resource efficiency, query speed, and operational stability. The service now runs 20 clusters, manages over 135 TiB of data, and continues to evolve with TiDB’s ongoing enhancements.

monitoringcloud-nativeperformance optimizationDatabasekubernetesTiDB
360 Smart Cloud
Written by

360 Smart Cloud

Official service account of 360 Smart Cloud, dedicated to building a high-quality, secure, highly available, convenient, and stable one‑stop cloud service platform.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.