Databases 20 min read

Cutting MySQL Disk Usage by 10% and Reducing QPS by 30%: A Real‑World Case Study

The article describes a high‑traffic core application with a MySQL master‑slave setup facing high resource usage; it details analysis of hardware, disk, table space, QPS, slow SQL, and presents governance actions including data archiving, moving reads to replicas, a MyBatis interceptor, and monitoring scripts, achieving significant performance improvements.

JD Cloud Developers

Jul 30, 2024

Cutting MySQL Disk Usage by 10% and Reducing QPS by 30%: A Real‑World Case Study

Background

The core application of a department relies on a MySQL master‑slave cluster that has been operating on aging physical machines for years. CPU, memory, and disk resources are all near their limits, and the system cannot add another replica due to hardware constraints. Before a major sales promotion, the team needed to lower these metrics to ensure stable operation.

Basic Situation

The same template‑caching logic of MCube is mentioned but not relevant to the database analysis.

2.1 Database

The cluster consists of one master and two slaves, all on retail‑grade physical servers. One slave has half the memory and uses mechanical disks, making it prone to failures and slow queries.

Domain          | Role | CPU | Memory | Capacity | Disk Usage % | Memory Usage % | MySQL Version
xxx_m.mysql.jddb.com | Master | 64 | 256G | 16T | 66.3 | 87.7 | 5.5.14
xxx_sb.mysql.jddb.com | Slave  | 64 | 256G | 16T | 66.6 | 85.2 | 5.5.14
xxx_sa.mysql.jddb.com | Slave  | 64 | 128G | 16T | 76.5 | 57.2 | 5.5.14

2.2 Disk Space

By the end of February, disk usage on each instance was already high.

IP          | Role | Used (GB) | Used % | Free (GB) | Weekly Growth (GB) | Alert Days | Available Days | Binlog (GB) | Log (GB)
1x.x.x.36  | M    | 5017      | 69     | 2151     | 9                  | 617.1      | 159.45543      | 6          |
1x.x.x.73  | S    | 5017      | 71     | 2151     | 14.8               | 333.2      | 1012.7         | 1          |
1x.x.x.135 | S    | 5017      | 4      | 129000   | 14.4               | 2986       | 8958           | 0          |

The tables show that all databases are operating at a high water level, prompting urgent data cleanup.

2.3 Table Space

Most tables contain tens of millions of rows; the top seven tables each occupy over 100 GB, with the status table holding 3 billion rows.

2.4 QPS Situation

The master’s QPS (shown in yellow) far exceeds that of the slaves. Only a small portion of queries currently hit the slaves, so the team decided to move high‑traffic reads to the replica.

2.5 Slow SQL

Both master and slave occasionally encounter slow queries, causing disk contention and affecting system stability.

Governance Goals

Data archiving to lower disk usage to a lower water level.

Shift high‑frequency queries from master to replica to reduce master QPS by about 30 % (target morning peak 15 k, afternoon peak 18 k).

Slow‑SQL remediation: eliminate queries >10 s completely, >5 s by 80 %, >1 s by 60 %.

4.1 Large Table Data Archiving

The seven large tables have different business characteristics, so the archiving strategy varies:

Historical data without business value (e.g., xxx_exception ) is deleted directly.

Pure historical logs (e.g., xxx_status ) are exported to a big‑data platform and then deleted.

Core business data (e.g., xxx_main , xxx_item , xxx_extend ) is synchronized to the big‑data platform and moved to a historical database, retaining one year of data.

Table Name   | Table Space (GB) | Index Space (GB) | In Big Data | Archiving Type | Start Value               | End Value
xxx_status   | 991.65            | 265.29            | Yes         | Delete         | 2020-04-30 01:00:00       | 2022-01-01
xxx_main     | 611.80            | 149.91            | Yes         | Transfer       | 2021-09-30                | 2022-01-01
xxx_exception| 382.80            | 24.65             | No          | Delete         | 2018-05-16 20:30:04       | 2022-01-01
... (other tables omitted for brevity)

After a month of archiving, 470 GB (≈10 %) of disk space was reclaimed while keeping a 365‑day data retention window.

4.2 Intercepting No‑Parameter Queries

A MyBatis plugin intercepts queries with empty parameter lists to prevent full‑table scans. When such a query is detected, an alarm is raised and a BadArgumentException is thrown.

import org.apache.ibatis.executor.Executor;
import org.apache.ibatis.mapping.BoundSql;
import org.apache.ibatis.mapping.MappedStatement;
import org.apache.ibatis.mapping.ParameterMapping;
import org.apache.ibatis.plugin.*;
import org.apache.ibatis.session.ResultHandler;
import org.apache.ibatis.session.RowBounds;
import java.lang.reflect.InvocationTargetException;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
/**
 * MyBatis interceptor that blocks queries without parameters to avoid full‑table scans.
 */
@Intercepts({
    @Signature(type = Executor.class, method = "query",
        args = {MappedStatement.class, Object.class, RowBounds.class, ResultHandler.class})})
public class ParameterInterceptor implements Interceptor {
    private final Map<Object, Object> mappedStatementIdMap = new HashMap<>();
    @Override
    public Object intercept(Invocation invocation) throws InvocationTargetException, IllegalAccessException {
        Object[] queryArgs = invocation.getArgs();
        MappedStatement mappedStatement = null;
        if (queryArgs[0] instanceof MappedStatement) {
            mappedStatement = (MappedStatement) queryArgs[0];
        }
        if (mappedStatement != null && mappedStatementIdMap.containsKey(mappedStatement.getId())) {
            BoundSql boundSql = mappedStatement.getBoundSql(queryArgs[1]);
            List<ParameterMapping> parameterMappingList = boundSql.getParameterMappings();
            if (parameterMappingList.isEmpty()) {
                Profiler.businessAlarm(mappedStatement.getId(), "查询参数为空");
                throw new BadArgumentException("查询参数为空，请确认入参是否有值");
            }
        }
        return invocation.proceed();
    }
    @Override
    public Object plugin(Object target) {
        return Plugin.wrap(target, this);
    }
    @Override
    public void setProperties(Properties properties) {
        for (Object key : properties.keySet()) {
            mappedStatementIdMap.put(key, 1);
        }
    }
}

4.3 Routing Queries to Replica

The team uses a JSF filter combined with UMP monitoring to count API invocations. The top‑10 high‑traffic interfaces are identified and annotated to force read‑only queries onto the slave.

<jsf:filter id="callFilter" ref="jsfInvokeFilter"/>

import com.jd.jsf.gd.filter.AbstractFilter;
import com.jd.jsf.gd.msg.RequestMessage;
import com.jd.jsf.gd.msg.ResponseMessage;
import com.jd.jsf.gd.util.RpcContext;
import com.jd.ump.profiler.proxy.Profiler;
import org.springframework.stereotype.Component;
import java.util.HashMap;
import java.util.Map;
/**
 * JSF filter for counting service method calls.
 */
@Component
public class JsfInvokeFilter extends AbstractFilter {
    private static final String API_PROVIDER_METHOD_COUNT_KEY = "api.jsf.provider.method.count.key";
    private static final String API_CONSUMER_METHOD_COUNT_KEY = "api.jsf.consumer.method.count.key";
    @Override
    public ResponseMessage invoke(RequestMessage requestMessage) {
        String key;
        if (RpcContext.getContext().isProviderSide()) {
            key = API_PROVIDER_METHOD_COUNT_KEY;
        } else {
            key = API_CONSUMER_METHOD_COUNT_KEY;
        }
        String method = requestMessage.getClassName() + "." + requestMessage.getMethodName();
        Map<String, String> tags = new HashMap<>(2);
        tags.put("bMark", method);
        tags.put("bCount", "1");
        Profiler.sourceDataByStr(key, tags);
        return getNext().invoke(requestMessage);
    }
}

A Python script pulls the monitoring data, aggregates call counts, and generates an Excel report for further analysis.

import os, openpyxl, json, requests, time
# (script omitted for brevity)

4.4 Slow SQL Governance

Targeted fixes are applied to slow queries. The team uses a gray‑release strategy, validates the changes, and then rolls them out fully. Queries longer than 10 s are eliminated completely; those longer than 5 s are reduced by 80 %; and those longer than 1 s are reduced by 60 %.

Conclusion

After the optimizations, disk usage decreased by about 10 % (≈470 GB) and master QPS peak dropped from 25 k to 17.5 k, improving system stability during peak traffic periods.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Database Governance slow-sql Read Replicas

Written by

JD Cloud Developers

JD Cloud Developers (Developer of JD Technology) is a JD Technology Group platform offering technical sharing and communication for AI, cloud computing, IoT and related developers. It publishes JD product technical information, industry content, and tech event news. Embrace technology and partner with developers to envision the future.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.