Understanding MySQL Encoding Mechanism and Solving Chinese Character Query Issues
This article explains MySQL's character encoding workflow, illustrates why queries containing Chinese characters fail without proper settings, and shows how to configure JDBC URLs, server variables, and Docker‑based MySQL instances to ensure lossless UTF‑8 handling.
Introduction
A reader encountered a problem when deploying a local MySQL test environment: queries containing Chinese characters returned no results, while English queries worked fine.
Reader Problem Description
The Java application used a JDBC URL without specifying character encoding:
jdbc:mysql://10.65.110.9:3306/test?connectTimeout=5000&socketTimeout=20000After importing the company's database dump to a personal MySQL instance, queries like SELECT * FROM USER WHERE name = '张三' failed. Adding characterEncoding=UTF-8 to the URL resolved the issue.
MySQL Encoding Mechanism Overview
When the client (e.g., a Java IDE) uses UTF‑8, the SQL statement is first encoded in UTF‑8, then converted to Unicode, and finally encoded according to character_set_client before being sent to the server.
The binary stream reaches MySQL, where character_set_connection decodes it, and the engine parses the statement.
After execution, the result is encoded using the database's character set and stored.
For SELECT queries, the data is decoded by the server, re‑encoded with character_set_connection , then with character_set_result , and finally sent to the client, which decodes it with UTF‑8.
The relevant character sets are:
character_set_client : encoding used by the client to send SQL.
character_set_connection : encoding used by the server to interpret the received binary stream; it allows developers to control semantic behavior such as SELECT LENGTH('中') returning 2 in GBK and 3 in UTF‑8.
character_set_result : encoding used for the result set returned to the client.
Data Flow for Write and Read Operations
Write (INSERT/UPDATE/DELETE): client → character_set_client → character_set_connection → DB
Read (SELECT): client → character_set_client → character_set_connection → DB → character_set_result
If each step performs a lossless conversion, the result set is correct. A lossless conversion means every character in the source encoding exists in the target encoding.
The client‑server character encoding is automatically detected unless the Connector/J properties characterEncoding and connectionCollation are set. The server variable character_set_server defines the default encoding.
Answer to the Original Problem
On the company's test server, character_set_server was set to UTF‑8 in my.cnf :
# my.cnf
[mysqld]
character-set-server=utf8Thus, even without specifying characterEncoding in the JDBC URL, the three client‑side character sets inherited UTF‑8, and queries with Chinese characters worked.
On the personal Docker‑based MySQL instance, my.cnf was empty, so the default server character set was latin1. Consequently, character_set_client , character_set_connection , and character_set_result were latin1, causing a loss‑of‑information conversion when handling Unicode Chinese characters.
The root cause was an incomplete migration: the database data was copied, but the server configuration file ( my.cnf ) was not, leaving the server with the wrong default charset.
Conclusion
Understanding MySQL's encoding workflow allows you to quickly diagnose and fix issues such as garbled characters or failed queries by ensuring consistent, lossless character set settings across client, connection, and server.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.