Database Performance Governance and Optimization for a High‑Load MySQL Application
This technical report describes the current high‑resource MySQL deployment, analyzes disk, table‑space, QPS and slow‑SQL issues, and presents a set of governance goals and concrete solutions—including data migration, query interception, and read‑from‑slave strategies—backed by code samples, tables and scripts to reduce load and improve stability before a major sales event.
Background
The core application relies on a MySQL cluster that has been operating at high resource utilization (CPU, disk, QPS) for a long time, and urgent governance is required before a major promotion to ensure stable operation.
2.1 Database
The database consists of one master and two slaves, all running on aging physical machines. One slave has half the memory and a mechanical disk, leading to high failure risk and low I/O performance.
IP
Domain
Master/Slave
CPU
Memory
Capacity
DISK Usage (%)
Memory Usage (%)
Version
1x.x.x.36
xxx_m.mysql.jddb.com
Master
64
256G
16T
66.3%
87.7%
5.5.14
1x.x.x.73
xxx_sb.mysql.jddb.com
Slave
64
256G
16T
66.6%
85.2%
5.5.14
1x.x.x.135
xxx_sa.mysql.jddb.com
Slave
64
128G
128T (≈7T usable)
76.5%
57.2%
5.5.14
2.2 Disk Space
Disk usage as of the end of February shows all three nodes are near capacity, requiring immediate cleanup or data migration.
IP
Master/Slave
Used (GB)
Usage (%)
Free (GB)
Weekly Growth (GB)
Estimated Alert (days)
Estimated Available (days)
Binlog (GB)
Log (GB)
1x.x.x.36
M
5017
69
2151
9
617.1
1735.8
159.45543
6
1x.x.x.73
S
5017
71
2151
14.8
333.2
1012.7
158.52228
1
1x.x.x.135
S
5017
4
129000
14.4
2986
8958
158.13548
0
All databases are already at a high water‑mark; data cleanup or migration is needed to lower disk usage.
2.3 Table Space
Several tables exceed 100 GB and contain billions of rows, especially the status table with 30 billion rows, indicating a classic large‑table problem.
2.4 QPS Situation
The master node handles far more queries than the slaves; only a small portion of traffic currently goes to the slaves, so moving high‑frequency queries to the slaves is essential.
2.5 Slow SQL
Both master and slave occasionally encounter slow queries that increase disk I/O and affect overall system stability.
3 Governance Goals
Data migration to reduce disk usage to a lower water‑mark.
Reduce master QPS by 30 % and keep peak QPS below 18 k during the promotion.
Eliminate slow SQL: 100 % of queries >10 s, 80 % of >5 s, and 60 % of >1 s.
4 Governance Solutions
4.1 Large‑Table Data Migration
Seven tables larger than 100 GB are handled differently based on business relevance: historical tables are deleted, pure history tables are migrated to a big‑data platform and then deleted, and core business tables are migrated and kept in a historical database.
Table
Table Space (GB)
Index Space (GB)
Big Data
Migration Type
Start
End
xxx_status
991.65
265.29
Yes
Delete
2020‑04‑30 01:00:00
2022‑01‑01
xxx_main
611.80
149.91
Yes
Migrate
2021‑09‑30
2022‑01‑01
xxx_exception
382.80
24.65
No
Delete
2018‑05‑16 20:30:04
2022‑01‑01
xxx_product_code
244.18
61.54
Yes
Delete
23 billion
xxx_item
208.66
85.46
Yes
Migrate
2016‑12‑29 13:20:33
2022‑01‑01
xxx_freights_info
128.78
109.03
Yes
Migrate
2018‑11‑29 13:26:00
xxx_extend
127.36
26.07
Yes
Migrate
2019‑03‑29 14:30:00
2022‑01‑01
After migration, about 470 GB (≈10 %) of disk space was reclaimed while keeping one year of data.
4.2 Intercept Parameter‑less Queries
Occasional queries without parameters cause full‑table scans and severe disk I/O. A MyBatis plugin intercepts such queries at the DAO layer and rejects them, eliminating the problem.
mybatis‑config.xml plugin configuration:
ParameterInterceptor source code:
import org.apache.ibatis.executor.Executor;
import org.apache.ibatis.mapping.BoundSql;
import org.apache.ibatis.mapping.MappedStatement;
import org.apache.ibatis.mapping.ParameterMapping;
import org.apache.ibatis.plugin.*;
import org.apache.ibatis.session.ResultHandler;
import org.apache.ibatis.session.RowBounds;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
* MyBatis interceptor to block queries without parameters, preventing full‑table scans
*/
@Intercepts({@Signature(type = Executor.class, method = "query", args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class})})
public class ParameterInterceptor implements Interceptor {
private final Map
mappedStatementIdMap = new HashMap<>();
@Override
public Object intercept(Invocation invocation) throws InvocationTargetException, IllegalAccessException {
Object[] queryArgs = invocation.getArgs();
MappedStatement mappedStatement = null;
if (queryArgs[0] instanceof MappedStatement) {
mappedStatement = (MappedStatement) queryArgs[0];
}
if (mappedStatement != null && mappedStatementIdMap.containsKey(mappedStatement.getId())) {
BoundSql boundSql = mappedStatement.getBoundSql(queryArgs[1]);
List
parameterMappingList = boundSql.getParameterMappings();
if (parameterMappingList.isEmpty()) {
Profiler.businessAlarm(mappedStatement.getId(), "查询参数为空");
throw new BadArgumentException("查询参数为空,请确认入参是否有值");
}
}
return invocation.proceed();
}
@Override
public Object plugin(Object target) {
return Plugin.wrap(target, this);
}
@Override
public void setProperties(Properties properties) {
for (Object key : properties.keySet()) {
mappedStatementIdMap.put(key, 1);
}
}
}4.3 Switch High‑Frequency Queries to Slave
The master QPS peaks at 30 k/s. By identifying the top‑10 API calls and routing them to the slave using JSF filters and UMP monitoring, the master peak was reduced to 17.5 k/s.
JSF filter configuration:
<jsf:filter id="callFilter" ref="jsfInvokeFilter"/>JsfInvokeFilter source code:
import com.jd.jsf.gd.filter.AbstractFilter;
import com.jd.jsf.gd.msg.RequestMessage;
import com.jd.jsf.gd.msg.ResponseMessage;
import com.jd.jsf.gd.util.RpcContext;
import com.jd.ump.profiler.proxy.Profiler;
import org.springframework.stereotype.Component;
import java.util.HashMap;
import java.util.Map;
/**
* JSF filter for counting API invocation times
*/
@Component
public class JsfInvokeFilter extends AbstractFilter {
private static final String API_PROVIDER_METHOD_COUNT_KEY = "api.jsf.provider.method.count.key";
private static final String API_CONSUMER_METHOD_COUNT_KEY = "api.jsf.consumer.method.count.key";
@Override
public ResponseMessage invoke(RequestMessage requestMessage) {
String key = RpcContext.getContext().isProviderSide() ? API_PROVIDER_METHOD_COUNT_KEY : API_CONSUMER_METHOD_COUNT_KEY;
String method = requestMessage.getClassName() + "." + requestMessage.getMethodName();
Map
tags = new HashMap<>(2);
tags.put("bMark", method);
tags.put("bCount", "1");
Profiler.sourceDataByStr(key, tags);
return getNext().invoke(requestMessage);
}
}Python script to fetch JSF call statistics and generate an Excel report:
import os
import openpyxl
import json
import requests
from cookies import Cookie
import time
headers = {
'Cookie': Cookie,
'Content-Type': 'application/json',
'token': '******',
'erp': '******'
}
def get_jsf(start_time, end_time):
url = 'http://xxx.taishan.jd.com/api/xxx/xxx/xxx/'
body = {}
params = {
'startTime': start_time,
'endTime': end_time,
'endPointKey': 'api.jsf.provider.method.count.key',
'quickTime': int((end_time - start_time) / 1000),
'markFlag': 'true',
'markLimit': 500
}
res = requests.post(url=url, data=json.dumps(body), params=params, headers=headers)
res_json = json.loads(res.text)
# processing logic omitted for brevity
# ... generate Excel file ...
if __name__ == '__main__':
start_time = '2024-03-06 12:20:00'
end_time = '2024-03-07 12:20:00'
get_jsf(change_time(start_time), change_time(end_time))After applying the read‑from‑slave strategy, the master QPS peak dropped from 25 k to 17.5 k, achieving the target reduction.
4.4 Slow‑SQL Governance
Targeted actions eliminated all queries longer than 10 s, reduced >5 s queries by 80 %, and cut >1 s queries by 60 % while ensuring thorough testing before full rollout.
Conclusion
The report outlines a low‑risk, stability‑first governance plan that combines data migration, read‑from‑slave routing, query interception, and slow‑SQL remediation to bring the MySQL cluster back to a healthy operating level before the upcoming promotion.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.