Operations 16 min read

Analyzing OceanBase Error Logs to Locate Error Causes

This article explains the types and formats of OceanBase logs, how to identify relevant log files, interpret log fields such as trace_id, lt, and dc, and provides step-by-step methods for using error codes and trace IDs to pinpoint the root cause of errors.

Aikesheng Open Source Community

Oct 19, 2022

Analyzing OceanBase Error Logs to Locate Error Causes

OceanBase generates a large number of logs at various levels during runtime. When errors occur, locating the root cause among these logs can be challenging. This guide describes how to find the desired error information in OceanBase error logs.

1. Log Files

OceanBase logs are divided into three categories:

Election module logs: stored in election module logs.

RootService (total control service) module logs: stored in RootService module logs.

Startup and runtime logs: store logs from all other modules.

Each category contains two types of log files: .log and .log.YYYYmmDDHHMMSS contain logs of all levels. .log.wf and .log.wf.YYYYmmDDHHMMSS contain WARN, USER_ERROR, and ERROR levels (requires enable_syslog_wf set to true).

Each log type writes to its corresponding .log (or .log.wf ) file; when the file reaches 256 MB it is renamed with a .YYYYmmDDHHMMSS suffix and a new file is created.

Listing the log directory with ls -l shows files such as:

-rw-r--r-- 1 admin admin 18688919 Oct 9 02:08 election.log
-rw-r--r-- 1 admin admin 4998884 Oct 9 02:07 election.log.wf
-rw-r--r-- 1 admin admin 75158675 Oct 9 02:08 rootservice.log
-rw-r--r-- 1 admin admin 268437081 Oct 8 23:25 rootservice.log.20221008232523
-rw-r--r-- 1 admin admin 61688030 Oct 9 02:08 rootservice.log.wf
-... (other log files)

2. Log Formats

Based on log level, OceanBase uses two formats:

DEBUG, TRACE, INFO format:

[time] log_level [module_name] file_name:line_no [thread_id][coroutine_id][Ytraceid0-traceid1] [lt=last_log_print_time] [dc=dropped_log_count] log_data

WARN, USER_ERROR, ERROR format (adds function_name field):

[time] log_level [module_name] function_name (file_name:line_no) [thread_id][coroutine_id][Ytraceid0-traceid1] [lt=last_log_print_time] [dc=dropped_log_count] log_data

Key fields:

trace_id : a unique identifier for a SQL across the cluster, e.g., YB420ABA3C91-0005E98FC3148792.

lt (last_log_print_time): for asynchronous logging it records the time spent formatting the log; for synchronous logging it records the time spent writing the previous log to disk.

dc (dropped_log_count): number of logs that failed to be written since the previous successful log (only present in asynchronous logs).

3. Analyzing Logs

When an SQL execution fails, OceanBase returns an error code and message, which may be vague. To obtain clearer information, follow two steps.

3.1 Find trace_id by error code

Search the error code (with a leading negative sign) in the following order: observer.log.wf If not found, rootservice.log.wf If still not found, election.log.wf Example: creating a resource pool returns error 4624. Use: grep "ret=-4624" observer.log.wf The grep output contains the

trace_id

YB420ABA3C91-0005E98FC5948785

3.2 Find error logs by trace_id

After obtaining the trace_id, grep it in .log.wf files (starting with observer.log.wf). If the needed information is not in the first file, continue with rootservice.log.wf and election.log.wf. This ensures you capture the full sequence of logs related to the failing SQL.

4. Existing Problems

In practice, OceanBase error logs sometimes do not provide a clear cause, requiring manual experience and the "guess‑verify" approach to trace the issue.

Example: a BOOTSTRAP command fails with ERROR 4012 (HY000): Timeout. The closest log shows:

[2022-10-09 09:05:32.052665] WARN  [BOOTSTRAP] execute_bootstrap (ob_bootstrap.cpp:759) [16840][446][YB420ABA3C91-0005EA9633379D17] [lt=21] [dc=0] failed to wait all rs online(ret=-4622)

The issue was that all OBServer instances were started with zone z1, while the BOOTSTRAP command specified zones z1, z2, z3, causing the cluster to wait for non‑existent RootService nodes.

5. Expectations for Future Versions

Better error cause identification: logs should record the precise reason for failures to aid operations.

Include call‑stack hierarchy in .log.wf entries so that the method that triggered the error can be quickly located.

Example of a desired call‑stack prefix:

[2022-10-09 06:39:25.317429] WARN  3.2.1.1.1 log_user_error_and_warn (ob_rpc_proxy.cpp:300) [3414][2255][YB420ABA3C91-0005E98FC5948785] [lt=5] [dc=0] machine resource 'z1' is not enough to hold a new unit
... (subsequent logs with decreasing hierarchy numbers)

Including such hierarchical identifiers would allow operators to pinpoint the exact method where the error originated, improving troubleshooting efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Troubleshooting OceanBase traceid DatabaseOperations ErrorLog

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.