Databases 9 min read

Understanding MySQL Encoding Mechanism and Solving Chinese Character Query Issues

This article explains MySQL's character encoding workflow, illustrates why queries containing Chinese characters fail without proper settings, and shows how to configure JDBC URLs, server variables, and Docker‑based MySQL instances to ensure lossless UTF‑8 handling.

Full-Stack Internet Architecture

Nov 12, 2021

Understanding MySQL Encoding Mechanism and Solving Chinese Character Query Issues

Introduction

A reader encountered a problem when deploying a local MySQL test environment: queries containing Chinese characters returned no results, while English queries worked fine.

Reader Problem Description

The Java application used a JDBC URL without specifying character encoding:

jdbc:mysql://10.65.110.9:3306/test?connectTimeout=5000&socketTimeout=20000

After importing the company's database dump to a personal MySQL instance, queries like SELECT * FROM USER WHERE name = '张三' failed. Adding characterEncoding=UTF-8 to the URL resolved the issue.

MySQL Encoding Mechanism Overview

When the client (e.g., a Java IDE) uses UTF‑8, the SQL statement is first encoded in UTF‑8, then converted to Unicode, and finally encoded according to character_set_client before being sent to the server.

The binary stream reaches MySQL, where character_set_connection decodes it, and the engine parses the statement.

After execution, the result is encoded using the database's character set and stored.

For SELECT queries, the data is decoded by the server, re‑encoded with character_set_connection, then with character_set_result, and finally sent to the client, which decodes it with UTF‑8.

The relevant character sets are: character_set_client: encoding used by the client to send SQL. character_set_connection: encoding used by the server to interpret the received binary stream; it allows developers to control semantic behavior such as SELECT LENGTH('中') returning 2 in GBK and 3 in UTF‑8. character_set_result: encoding used for the result set returned to the client.

Data Flow for Write and Read Operations

Write (INSERT/UPDATE/DELETE):

client → character_set_client → character_set_connection → DB

Read (SELECT):

client → character_set_client → character_set_connection → DB → character_set_result

If each step performs a lossless conversion, the result set is correct. A lossless conversion means every character in the source encoding exists in the target encoding.

The client‑server character encoding is automatically detected unless the Connector/J properties characterEncoding and connectionCollation are set. The server variable character_set_server defines the default encoding.

Answer to the Original Problem

On the company's test server, character_set_server was set to UTF‑8 in my.cnf:

# my.cnf
[mysqld]
character-set-server=utf8

Thus, even without specifying characterEncoding in the JDBC URL, the three client‑side character sets inherited UTF‑8, and queries with Chinese characters worked.

On the personal Docker‑based MySQL instance, my.cnf was empty, so the default server character set was latin1. Consequently, character_set_client, character_set_connection, and character_set_result were latin1, causing a loss‑of‑information conversion when handling Unicode Chinese characters.

The root cause was an incomplete migration: the database data was copied, but the server configuration file ( my.cnf) was not, leaving the server with the wrong default charset.

Conclusion

Understanding MySQL's encoding workflow allows you to quickly diagnose and fix issues such as garbled characters or failed queries by ensuring consistent, lossless character set settings across client, connection, and server.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Docker SQL MySQL JDBC UTF-8 character encoding Database Configuration

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.