Databases 18 min read

Database Performance Optimization and Governance for a High‑Load Application

This technical report analyzes a high‑traffic MySQL deployment with three servers, detailing disk usage, table space, QPS and slow‑SQL issues, and presents a comprehensive governance plan that includes data migration, read‑traffic offloading to replicas, MyBatis interceptor implementation, JSF monitoring, and Python automation to reduce disk pressure and improve query performance.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Database Performance Optimization and Governance for a High‑Load Application

Background

The core application relies on a MySQL cluster (one master and two slaves) that has been operating at high resource utilization for years, with CPU, disk, and QPS all near capacity, requiring urgent pre‑promotion optimization.

Database Overview

The cluster consists of three physical machines (master 1x.x.x.36, slaves 1x.x.x.73 and 1x.x.x.135) each with 64 CPU cores, 256 GB RAM, and 16 TB of RAID‑0 storage, running MySQL 5.5.14.

Disk Space

By the end of February, disk usage on all nodes exceeded 65 % with limited remaining space, prompting a need to free up storage through data cleanup and archiving.

Table Space

Several tables exceed 100 GB and contain billions of rows, notably the xxx_status table with 30 billion rows.

QPS Situation

The master node handles the majority of queries, reaching peak QPS of 30 k/s, while the slaves are under‑utilized.

Slow SQL

Occasional slow queries (>1 s) on both master and slaves cause disk I/O spikes and affect overall system stability.

Governance Goals

Archive or delete table space >100 GB to reduce disk usage by ~10 % (≈470 GB).

Shift high‑frequency read traffic from master to slaves to lower master QPS by ~30 %.

Eliminate slow SQL: fully remove >10 s queries, reduce >5 s queries by 80 %, and >1 s queries by 60 %.

Governance Plan

1. Large Table Data Migration

Different tables require different handling: historical tables are deleted, pure history tables are archived to a big‑data platform before deletion, and core business tables are migrated to a historical database after archiving.

import org.apache.ibatis.executor.Executor;
import org.apache.ibatis.mapping.BoundSql;
import org.apache.ibatis.mapping.MappedStatement;
import org.apache.ibatis.plugin.*;

@Intercepts({@Signature(type = Executor.class, method = "query", args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class})})
public class ParameterInterceptor implements Interceptor {
    private final Map
mappedStatementIdMap = new HashMap<>();
    @Override
    public Object intercept(Invocation invocation) throws InvocationTargetException, IllegalAccessException {
        Object[] queryArgs = invocation.getArgs();
        MappedStatement mappedStatement = null;
        if (queryArgs[0] instanceof MappedStatement) {
            mappedStatement = (MappedStatement) queryArgs[0];
        }
        if (mappedStatement != null && mappedStatementIdMap.containsKey(mappedStatement.getId())) {
            BoundSql boundSql = mappedStatement.getBoundSql(queryArgs[1]);
            List
parameterMappingList = boundSql.getParameterMappings();
            if (parameterMappingList.isEmpty()) {
                Profiler.businessAlarm(mappedStatement.getId(), "查询参数为空");
                throw new BadArgumentException("查询参数为空,请确认入参是否有值");
            }
        }
        return invocation.proceed();
    }
    @Override
    public Object plugin(Object target) { return Plugin.wrap(target, this); }
    @Override
    public void setProperties(Properties properties) {
        for (Object key : properties.keySet()) {
            mappedStatementIdMap.put(key, 1);
        }
    }
}

2. Intercept Empty‑Parameter Queries

A MyBatis plugin rejects any query without parameters, preventing full‑table scans.

3. Read‑Traffic Offloading to Slaves

JSF filters record API call counts; a Python script retrieves the top‑10 most‑called methods and guides the addition of annotations to route those reads to slave replicas.

def get_jsf(start_time, end_time):
    url = 'http://xxx.taishan.jd.com/api/xxx/xxx/xxx/'
    params = {
        'startTime': start_time,
        'endTime': end_time,
        'endPointKey': 'api.jsf.provider.method.count.key',
        'quickTime': int((end_time - start_time) / 1000),
        'markFlag': 'true',
        'markLimit': 500
    }
    res = requests.post(url=url, data=json.dumps({}), params=params, headers=headers)
    # Process response and generate Excel report
    ...

4. Monitoring and Reporting

JSF filter implementation logs each API invocation to a monitoring system, enabling statistical analysis of call frequencies.

@Component
public class JsfInvokeFilter extends AbstractFilter {
    private static final String API_PROVIDER_METHOD_COUNT_KEY = "api.jsf.provider.method.count.key";
    private static final String API_CONSUMER_METHOD_COUNT_KEY = "api.jsf.consumer.method.count.key";
    @Override
    public ResponseMessage invoke(RequestMessage requestMessage) {
        String key = RpcContext.getContext().isProviderSide() ? API_PROVIDER_METHOD_COUNT_KEY : API_CONSUMER_METHOD_COUNT_KEY;
        String method = requestMessage.getClassName() + "." + requestMessage.getMethodName();
        Map
tags = new HashMap<>(2);
        tags.put("bMark", method);
        tags.put("bCount", "1");
        Profiler.sourceDataByStr(key, tags);
        return getNext().invoke(requestMessage);
    }
}

Results

After implementing the plan, disk usage decreased by ~470 GB (10 %), master QPS peak dropped from 25 k to 17.5 k (≈30 % reduction), and slow‑SQL incidents were largely eliminated according to the defined thresholds.

Conclusion

The adopted governance strategy focused on safe, incremental improvements—data archiving, read‑traffic redistribution, and query validation—while avoiding high‑risk actions such as immediate sharding or cloud migration. Future scaling may involve cloud migration combined with sharding when traffic grows beyond current capacity.

data migrationmonitoringPerformanceMySQLDatabase Optimizationslow queryread replication
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.