Databases 23 min read

Database Performance Governance and Optimization for a High‑Load MySQL Application

This technical report describes the current high‑resource MySQL deployment, analyzes disk, table‑space, QPS and slow‑SQL issues, and presents a set of governance goals and concrete solutions—including data migration, query interception, and read‑from‑slave strategies—backed by code samples, tables and scripts to reduce load and improve stability before a major sales event.

JD Tech
JD Tech
JD Tech
Database Performance Governance and Optimization for a High‑Load MySQL Application

Background

The core application relies on a MySQL cluster that has been operating at high resource utilization (CPU, disk, QPS) for a long time, and urgent governance is required before a major promotion to ensure stable operation.

2.1 Database

The database consists of one master and two slaves, all running on aging physical machines. One slave has half the memory and a mechanical disk, leading to high failure risk and low I/O performance.

IP

Domain

Master/Slave

CPU

Memory

Capacity

DISK Usage (%)

Memory Usage (%)

Version

1x.x.x.36

xxx_m.mysql.jddb.com

Master

64

256G

16T

66.3%

87.7%

5.5.14

1x.x.x.73

xxx_sb.mysql.jddb.com

Slave

64

256G

16T

66.6%

85.2%

5.5.14

1x.x.x.135

xxx_sa.mysql.jddb.com

Slave

64

128G

128T (≈7T usable)

76.5%

57.2%

5.5.14

2.2 Disk Space

Disk usage as of the end of February shows all three nodes are near capacity, requiring immediate cleanup or data migration.

IP

Master/Slave

Used (GB)

Usage (%)

Free (GB)

Weekly Growth (GB)

Estimated Alert (days)

Estimated Available (days)

Binlog (GB)

Log (GB)

1x.x.x.36

M

5017

69

2151

9

617.1

1735.8

159.45543

6

1x.x.x.73

S

5017

71

2151

14.8

333.2

1012.7

158.52228

1

1x.x.x.135

S

5017

4

129000

14.4

2986

8958

158.13548

0

All databases are already at a high water‑mark; data cleanup or migration is needed to lower disk usage.

2.3 Table Space

Several tables exceed 100 GB and contain billions of rows, especially the status table with 30 billion rows, indicating a classic large‑table problem.

2.4 QPS Situation

The master node handles far more queries than the slaves; only a small portion of traffic currently goes to the slaves, so moving high‑frequency queries to the slaves is essential.

2.5 Slow SQL

Both master and slave occasionally encounter slow queries that increase disk I/O and affect overall system stability.

3 Governance Goals

Data migration to reduce disk usage to a lower water‑mark.

Reduce master QPS by 30 % and keep peak QPS below 18 k during the promotion.

Eliminate slow SQL: 100 % of queries >10 s, 80 % of >5 s, and 60 % of >1 s.

4 Governance Solutions

4.1 Large‑Table Data Migration

Seven tables larger than 100 GB are handled differently based on business relevance: historical tables are deleted, pure history tables are migrated to a big‑data platform and then deleted, and core business tables are migrated and kept in a historical database.

Table

Table Space (GB)

Index Space (GB)

Big Data

Migration Type

Start

End

xxx_status

991.65

265.29

Yes

Delete

2020‑04‑30 01:00:00

2022‑01‑01

xxx_main

611.80

149.91

Yes

Migrate

2021‑09‑30

2022‑01‑01

xxx_exception

382.80

24.65

No

Delete

2018‑05‑16 20:30:04

2022‑01‑01

xxx_product_code

244.18

61.54

Yes

Delete

23 billion

xxx_item

208.66

85.46

Yes

Migrate

2016‑12‑29 13:20:33

2022‑01‑01

xxx_freights_info

128.78

109.03

Yes

Migrate

2018‑11‑29 13:26:00

xxx_extend

127.36

26.07

Yes

Migrate

2019‑03‑29 14:30:00

2022‑01‑01

After migration, about 470 GB (≈10 %) of disk space was reclaimed while keeping one year of data.

4.2 Intercept Parameter‑less Queries

Occasional queries without parameters cause full‑table scans and severe disk I/O. A MyBatis plugin intercepts such queries at the DAO layer and rejects them, eliminating the problem.

mybatis‑config.xml plugin configuration:

ParameterInterceptor source code:

import org.apache.ibatis.executor.Executor;
import org.apache.ibatis.mapping.BoundSql;
import org.apache.ibatis.mapping.MappedStatement;
import org.apache.ibatis.mapping.ParameterMapping;
import org.apache.ibatis.plugin.*;
import org.apache.ibatis.session.ResultHandler;
import org.apache.ibatis.session.RowBounds;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* MyBatis interceptor to block queries without parameters, preventing full‑table scans
*/
@Intercepts({@Signature(type = Executor.class, method = "query", args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class})})
public class ParameterInterceptor implements Interceptor {
private final Map
mappedStatementIdMap = new HashMap<>();
@Override
public Object intercept(Invocation invocation) throws InvocationTargetException, IllegalAccessException {
Object[] queryArgs = invocation.getArgs();
MappedStatement mappedStatement = null;
if (queryArgs[0] instanceof MappedStatement) {
mappedStatement = (MappedStatement) queryArgs[0];
}
if (mappedStatement != null && mappedStatementIdMap.containsKey(mappedStatement.getId())) {
BoundSql boundSql = mappedStatement.getBoundSql(queryArgs[1]);
List
parameterMappingList = boundSql.getParameterMappings();
if (parameterMappingList.isEmpty()) {
Profiler.businessAlarm(mappedStatement.getId(), "查询参数为空");
throw new BadArgumentException("查询参数为空,请确认入参是否有值");
}
}
return invocation.proceed();
}
@Override
public Object plugin(Object target) {
return Plugin.wrap(target, this);
}
@Override
public void setProperties(Properties properties) {
for (Object key : properties.keySet()) {
mappedStatementIdMap.put(key, 1);
}
}
}

4.3 Switch High‑Frequency Queries to Slave

The master QPS peaks at 30 k/s. By identifying the top‑10 API calls and routing them to the slave using JSF filters and UMP monitoring, the master peak was reduced to 17.5 k/s.

JSF filter configuration:

<jsf:filter id="callFilter" ref="jsfInvokeFilter"/>

JsfInvokeFilter source code:

import com.jd.jsf.gd.filter.AbstractFilter;
import com.jd.jsf.gd.msg.RequestMessage;
import com.jd.jsf.gd.msg.ResponseMessage;
import com.jd.jsf.gd.util.RpcContext;
import com.jd.ump.profiler.proxy.Profiler;
import org.springframework.stereotype.Component;
import java.util.HashMap;
import java.util.Map;
/**
* JSF filter for counting API invocation times
*/
@Component
public class JsfInvokeFilter extends AbstractFilter {
private static final String API_PROVIDER_METHOD_COUNT_KEY = "api.jsf.provider.method.count.key";
private static final String API_CONSUMER_METHOD_COUNT_KEY = "api.jsf.consumer.method.count.key";
@Override
public ResponseMessage invoke(RequestMessage requestMessage) {
String key = RpcContext.getContext().isProviderSide() ? API_PROVIDER_METHOD_COUNT_KEY : API_CONSUMER_METHOD_COUNT_KEY;
String method = requestMessage.getClassName() + "." + requestMessage.getMethodName();
Map
tags = new HashMap<>(2);
tags.put("bMark", method);
tags.put("bCount", "1");
Profiler.sourceDataByStr(key, tags);
return getNext().invoke(requestMessage);
}
}

Python script to fetch JSF call statistics and generate an Excel report:

import os
import openpyxl
import json
import requests
from cookies import Cookie
import time

headers = {
    'Cookie': Cookie,
    'Content-Type': 'application/json',
    'token': '******',
    'erp': '******'
}

def get_jsf(start_time, end_time):
    url = 'http://xxx.taishan.jd.com/api/xxx/xxx/xxx/'
    body = {}
    params = {
        'startTime': start_time,
        'endTime': end_time,
        'endPointKey': 'api.jsf.provider.method.count.key',
        'quickTime': int((end_time - start_time) / 1000),
        'markFlag': 'true',
        'markLimit': 500
    }
    res = requests.post(url=url, data=json.dumps(body), params=params, headers=headers)
    res_json = json.loads(res.text)
    # processing logic omitted for brevity
    # ... generate Excel file ...

if __name__ == '__main__':
    start_time = '2024-03-06 12:20:00'
    end_time = '2024-03-07 12:20:00'
    get_jsf(change_time(start_time), change_time(end_time))

After applying the read‑from‑slave strategy, the master QPS peak dropped from 25 k to 17.5 k, achieving the target reduction.

4.4 Slow‑SQL Governance

Targeted actions eliminated all queries longer than 10 s, reduced >5 s queries by 80 %, and cut >1 s queries by 60 % while ensuring thorough testing before full rollout.

Conclusion

The report outlines a low‑risk, stability‑first governance plan that combines data migration, read‑from‑slave routing, query interception, and slow‑SQL remediation to bring the MySQL cluster back to a healthy operating level before the upcoming promotion.

data migrationBig DataPerformance TuningMySQLDatabase Optimizationslow queryQPS Reduction
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.