Databases 18 min read

Analyzing and Optimizing Slow OceanBase SQL Queries Using TopSQL and Execution Plan Rewrites

This article investigates a slow OceanBase SQL query by examining CPU and TopSQL metrics, dissecting the execution plan, identifying batch join inefficiencies, rewriting the SQL for better performance, and demonstrating a reduction in execution time from over 2600 seconds to under one second.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analyzing and Optimizing Slow OceanBase SQL Queries Using TopSQL and Execution Plan Rewrites

1. Background

1.1 Problem Description

A business system reports extremely slow responses from the OceanBase database, causing service degradation. Monitoring shows CPU usage spiking to 100% during the incident.

TopSQL captured a single slow SQL with an average execution time exceeding 2800 seconds.

1.2 Environment Information

Database: OceanBase Enterprise Edition 4.2.1.9

Architecture: 1-1-1, PrimaryZone Random

Tenant Specification: 30c300g

1.3 The Slowest TopSQL

The SQL text and its SQL‑ID were retrieved.

select *
  from (
    select A.*, RowNum as Fast_RowNum
    from (
      SELECT *
      FROM (
        SELECT TABLE1.C1, TABLE1.C2, xxx, TABLE1.P1, TABLE1.P2
        FROM TABLE1
        LEFT JOIN TABLE2 ON TABLE2.PID = TABLE1.ID AND TABLE2.ISMAIN = '1'
        LEFT JOIN TABLE6 ON TABLE6.ID = TABLE2.INBANK
        LEFT JOIN TABLE4 ON TABLE4.ID = TABLE2.COUNTRY
        WHERE TABLE1.STATE_ISENABLED = '1'
      ) T
      where 1 = 1
      and ID in (
        SELECT ID FROM TABLE1 WHERE domaintype = '2' AND EXISTS (
          SELECT PID FROM TABLE3 WHERE DOMAINID IN (
            SELECT DOMAINID FROM TABLE5 WHERE ORGID = '0000000000000000001' AND ORGISENABLED = '1'
          ) AND DOMAINISENABLED = '1' AND TABLE1.ID = PID
        )
      )
    ) A
    where RowNum < 51
  )
  where Fast_RowNum > 0;

2. Analysis Process

2.1 SQL_AUDIT Analysis

The execution time distribution for the SQL_ID shows a baseline execution time above 2600 seconds.

2.2 Execution Plan Analysis

The original execution plan is displayed below:

===========================================================================================================================
|ID|OPERATOR                     |NAME                     |EST.ROWS|EST.TIME(us)|
---------------------------------------------------------------------------------------------------------------------------
|0 |COUNT                       |                         |50      |396423      |
|1 |└─SUBPLAN SCAN              |VIEW1                    |50      |396423      |
|2 |  └─LIMIT                   |                         |50      |396423      |
|3 |   └─NESTED-LOOP OUTER JOIN |                         |50      |396423      |
|4 |     ├─NESTED-LOOP OUTER JOIN|                         |50      |395510      |
|5 |     │ ├─SUBPLAN SCAN       |VIEW5                    |38      |394193      |
|6 |     │ │ └─LIMIT           |                         |38      |394193      |
|7 |     │ │   └─NESTED-LOOP SEMI JOIN|                 |38      |394193      |
|8 |     │ │   ├─TABLE FULL SCAN|TABLE1                  |6463    |2028        |
|9 |     │ │   └─SUBPLAN SCAN   |VIEW3                    |1       |60          |
|... (truncated for brevity) ...
===========================================================================================================================

Execution Plan Interpretation

The optimizer first performs a full table scan on TABLE1 , using it as the driver table. For each driver row, it repeatedly scans the right‑hand sub‑operators, leading to massive row‑wise processing.

The sub‑plan scan of VIEW3 is executed 287 0000 times.

The join does not benefit from batch processing ( use_batch=false ).

2.3 SQL_Plan_Monitor Analysis

Dynamic monitoring confirms that operators 7‑13 repeatedly rescan the inner tables, causing the long execution time.

+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+
| OP_ID | OP                           | OUTPUT_ROWS | RESCAN | THREADS | OPEN_DT | ROW_DT                     | OPEN_TIME                  | CLOSE_TIME | FIRST_ROW_TIME            | LAST_ROW_EOF_TIME         |
+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+
| 7     | PHY_NESTED_LOOP_JOIN         | 14          | 856237 | 1       | NULL    | +000000000 00:02:19.682129 | 2025-03-10 17:02:17.955383 | NULL       | 2025-03-10 17:02:20.792033 | 2025-03-10 17:05:24.567077 |
... (truncated) ...
+-------+------------------------------+-------------+--------+---------+---------+----------------------------+----------------------------+------------+----------------------------+----------------------------+

2.4 Why Batch Join Does Not Take Effect

OceanBase supports Batch Rescan for regular NLJ, but the current semi‑join implementation cannot use this optimization, so use_batch=false .

2.5 SQL Rewrite

The original SQL is rewritten to keep the left joins unchanged, extract complex filters into a separate view, replace the EXISTS with an inner join, and add DISTINCT to avoid duplicates.

SELECT *
FROM (
  SELECT A.*, ROWNUM AS Fast_RowNum
  FROM (
    SELECT T.*
    FROM (
      SELECT TABLE1.C1, TABLE1.C2, xxx, TABLE1.P1, TABLE1.P2
      FROM TABLE1
      LEFT JOIN TABLE2 ON TABLE2.PID = TABLE1.ID AND TABLE2.ISMAIN = '1'
      LEFT JOIN TABLE6 ON TABLE6.ID = TABLE2.INBANK
      LEFT JOIN TABLE4 ON TABLE4.ID = TABLE2.COUNTRY
      WHERE TABLE1.STATE_ISENABLED = '1'
    ) T
    JOIN (
      SELECT DISTINCT P.ID
      FROM TABLE1 P
      JOIN TABLE3 D ON P.ID = D.PID AND D.DOMAINISENABLED = '1'
      WHERE D.DOMAINID IN (
        SELECT DOMAINID FROM TABLE5 WHERE ORGID = '0000000000000000001' AND ORGISENABLED = '1'
      ) AND P.DOMAINTYPE = '2'
    ) NEW ON T.ID = NEW.ID
  ) A
  WHERE ROWNUM < 51
) WHERE Fast_RowNum > 0;

Rewrite Principles

Preserve left joins in the main query.

Move complex filters to an independent view.

Replace EXISTS with an inner join.

Add DISTINCT to guarantee unique IDs.

Rewritten Execution Plan (excerpt)

=========================================================================================================================
|ID|OPERATOR                     |NAME                     |EST.ROWS|EST.TIME(us)|
-------------------------------------------------------------------------------------------------------------------------
|0 |COUNT                       |                         |50      |5271       |
|1 |└─SUBPLAN SCAN              |VIEW1                    |50      |5271       |
|2 |  └─LIMIT                   |                         |50      |5271       |
|3 |   └─NESTED-LOOP OUTER JOIN |                         |50      |5271       |
|4 |     ├─NESTED-LOOP OUTER JOIN|                         |50      |4358       |
|5 |     │ ├─SUBPLAN SCAN       |VIEW2                    |38      |3041       |
|... (truncated) ...
=========================================================================================================================

The rewritten plan shows that the driver table now contains only 42 rows, allowing the optimizer to use batch join ( use_batch=true ) and dramatically reduce the number of full‑table scans.

Conclusion

Before optimization, the original SQL used TABLE1 as the driver in a nested‑loop semi‑join, resulting in over 2 600 seconds of execution time. After rewriting, the driver set shrank to 42 rows, batch join was enabled, and execution time dropped to 0.6 seconds.

Performance Comparison

Oracle: 1.79 s

OceanBase (original): >2600 s

OceanBase (optimized): 0.6 s

Supplementary Knowledge

1. Why does rewriting EXISTS to a semi‑join sometimes slow down the query? See reference [2].

2. How to verify whether the batch parameter is effective?

select * from CDB_OB_SYS_VARIABLES where name like '%batch%' and tenant_id=1054;

Result shows _nlj_batching_enabled = 1 , indicating batch processing is enabled.

References

[1] Batch Rescan: https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000221518#1-title-DAS%20Group%20Rescan

[2] Why does rewriting EXISTS to a semi‑join cause slowdown?: https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000477585?back=kb

SQL OptimizationDatabase PerformanceOceanBaseexecution planBatch JoinTopSQL
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.