Databases 16 min read

Design and Implementation of a MySQL Slow‑Query Risk Scoring Model

This article presents a comprehensive approach to quantifying MySQL slow‑query risk by defining scoring items, establishing boundary values, applying various scoring functions, and integrating business‑level weighting, ultimately enabling proactive identification and remediation of high‑impact slow queries.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Design and Implementation of a MySQL Slow‑Query Risk Scoring Model

Author Introduction Qian Fangyuan, a database engineer specialized in MySQL, Redis operations, and Go‑based database automation.

Background MySQL slow queries, recorded when execution exceeds a threshold, are a frequent source of performance issues. Traditional slow‑query platforms merely collect and display data, leading to an overwhelming “sea” of slow queries that DBA and developers cannot prioritize, causing long resolution cycles and risking service outages.

Analysis The core problem is the lack of a clear priority among numerous slow queries, making proactive resolution impossible. A scoring mechanism is needed to rank queries by business impact.

Solution Idea Create a scoring model where each slow query receives a score (out of 100); higher scores indicate higher risk. The model aggregates weighted scores of selected metrics.

Scoring Items Five metrics are chosen: QueryCount, Query_time, Lock_time, Bytes_sent, Rows_examined . These reflect frequency, execution time, lock wait, data volume, and rows scanned.

Scoring Model Definition The overall risk index is calculated as the sum of each item multiplied by its weight, with the total capped at 100.

score=func(x)

Boundary Selection For each metric, the 95th percentile of historical data defines the upper bound (top 5% outliers are excluded). The chosen window for QueryCount is a 10‑minute sliding window to capture peak activity.

Scoring Functions Four possible functions are supported: like‑sin (polynomial approximation), sin, exponent, and linear. The Go implementation below computes a single item score based on the selected function.

/**
 * @Description: 计算单项得分,分数介于最小分数和最大分数之间,可选的计分模型有:类正弦模型、正弦模型、指数模型、正比例模型
 * @Param val: 单项当前值
 * @Param minVal: 单项最小值
 * @Param maxVal: 单项最大值
 * @Param minScore: 单项最小得分
 * @Param maxScore: 单项最大得分
 * @Param calWay: 计分模型方式
 * @Return float64: 单项得分
 */
func calSingleScore(val, minVal, maxVal, minScore, maxScore float64, calWay string) float64 {
   if maxVal == 0 { // 如果值为0则返回0
      return 0
   }
   if val >= maxVal { // 如果值超过上边界,则设置为最大分数
      return maxScore
   }
   if val <= minVal { // 如果值低于下边界,则设置为最小分数
      return minScore
   }
   var scoreRatio float64
   switch calWay {
   case "likeSin": // 类正弦曲线
      b := 0.0547372760360247
      c := -0.0231045458864445
      d := 0.00455283203705563
      e := -0.000281663561505204
      f := 5.57101673606083e-06
      ratio := (val - minVal) / (maxVal - minVal) * 20
      scoreRatio = b*ratio + c*(ratio*ratio) + d*(ratio*ratio*ratio) +
         e*(ratio*ratio*ratio*ratio) + f*(ratio*ratio*ratio*ratio*ratio)
   case "sin": // 正弦曲线
      ratio := (val - minVal) / (maxVal - minVal)
      scoreRatio = math.Sin(math.Pi / 2 * ratio)
   case "exponent": // 指数曲线
      ratio := (val - minVal) / (maxVal - minVal)
      a := math.Log2(maxScore - minScore)
      scoreRatio = math.Pow(2, a*ratio)
      return scoreRatio
   default: // 默认是正比例
      scoreRatio = (val - minVal) / (maxVal - minVal)
   }
   return scoreRatio * (maxScore - minScore)
}

Sample Slow‑Query Log

# Time: 210818  9:54:25
# User@Host: fangyuan.qian[fangyuan.qian] @  [127.0.0.1]  Id: 316538768
# Schema:   Last_errno: 0  Killed: 0
# Query_time: 3.278988  Lock_time: 0.001516  Rows_sent: 284  Rows_examined: 1341  Rows_affected: 0
# Bytes_sent: 35600
SET timestamp=1629251665;
SELECT
               a.ts_min                                      AS slowlog_time,
               a.checksum,
               SUM(a.ts_cnt)                                 AS d_ts_cnt,
               ROUND(SUM(a.Query_time_sum), 2)               AS d_query_time,
               ROUND(SUM(a.Query_time_sum) / SUM(a.ts_cnt), 2) AS d_query_time_avg,
               a.host_max                                    AS host_ip,
               a.db_max                                      AS db_name,
               a.user_max                                    AS user_name,
               b.first_seen                                  AS first_seen_time
           FROM mysql_slowlog_192_168_0_84_3306.query_history a force index(idx_ts_min),
                mysql_slowlog_192_168_0_84_3306.query_review b
           WHERE a.checksum = b.checksum
               AND length(a.checksum)>=15
               AND ts_min >= '2021-06-04'
               AND ts_min < '2021-06-21'
           GROUP BY a.checksum;

Boundary Values The 95th‑percentile thresholds observed are approximately: Query_time ≈ 60 s, Lock_time ≈ 0.00629 s, Rows_examined ≈ 17.85 M rows, QueryCount ≈ 180 occurrences. Bytes_sent is used instead of Rows_sent for better volume representation.

Scoring Model Formula Each item is normalized to a 0‑100 scale, multiplied by its weight, and summed. The total risk index is limited to 100.

慢查询风险指数 = sum(func(慢查询评分项) * 权重)

Testing – Phase 1 Weight distribution and result distributions were visualized (images omitted). Sample SQL analysis highlighted that scanning rows has a disproportionate impact, prompting weight adjustments.

Testing – Phase 2 After re‑weighting and model tweaks, the revised model better reflects the influence of high‑row‑scan queries.

Conclusion The two‑weight scheme effectively surfaces slow queries with a risk index above 50, allowing teams to prioritize remediation.

Future Work Incorporate additional metrics from Percona Server/MariaDB, refine weighting, and extend the model to accommodate business‑level risk weighting, enabling a unified risk score across services.

Business Integration Introduce a "business level weight" per application to adjust the final risk index, and push top‑N risky queries to owners. Define a global "slow‑query risk safety line" to flag queries requiring immediate attention.

最终慢查询风险指数 = 慢查询风险指数 * 业务等级权重

Summary By converting slow queries into a quantifiable risk index, operations can shift from reactive, manual triage to proactive, data‑driven prioritization, ultimately reducing the median and high‑percentile risk scores across the system.

AutomationGometricsMySQLDatabase performanceslow queryrisk scoring
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.