Databases 10 min read

MySQL Encoding Process and Character Set Handling

This article explains how MySQL’s character_set parameters such as character_set_client and character_set_results control the encoding and decoding of client commands and query results, illustrates common pitfalls with UTF‑8, GBK and Latin‑1, and provides practical commands to avoid garbled text.

Architecture Digest

Mar 8, 2020

MySQL Encoding Process and Character Set Handling

MySQL garbled characters are often caused by the character_set parameters. The most important ones are character_set_client and character_set_results, which determine how MySQL decodes client commands and encodes query results.

To view the current character set variables you can run: SHOW VARIABLES LIKE "character%"; The output typically includes:

Variable_name               Value
character_set_client        utf8
character_set_connection    utf8
character_set_database      utf8
character_set_filesystem    binary
character_set_results       utf8
character_set_server        utf8
character_set_system        utf8
character_sets_dir          /usr/local/Cellar/[email protected]/5.7.24/share/mysql/charsets/

The character_set_client tells MySQL the encoding of the command sent by the client. For example, if it is set to UTF‑8, MySQL will decode the incoming byte stream as UTF‑8 before converting it to the table’s charset.

You can check a table’s column charset with: SHOW FULL COLUMNS FROM student; If the client charset is UTF‑8 but the table charset is GBK, inserting Chinese characters like: INSERT INTO student VALUES ('小明', 12); will cause MySQL to decode the command as UTF‑8, then convert the string to GBK before storing it.

The character_set_results parameter controls the encoding of query results. If the table is GBK and character_set_results is set to UTF‑8, MySQL will convert the retrieved data to UTF‑8 before sending it to the client.

The following diagram (originally included) visualises the read/write flow and shows how mismatched encodings lead to garbled output:

When the encoding used for storing data and the encoding used for retrieving data do not correspond, garbled characters appear.

You can change both parameters conveniently with: SET names gbk; After this command, character_set_client and character_set_results are both set to GBK.

UTF‑8, GBK and Latin‑1

These are the three most common MySQL encodings. All of them are backward compatible with ASCII. Latin‑1 is a single‑byte encoding (0x00‑0xFF). UTF‑8 can represent a far larger range of characters, and every Latin‑1 character can be converted to UTF‑8, but not necessarily to GBK.

The concept of “wrong‑in, wrong‑out” means that if the client’s encoding and the table’s encoding differ, you can still avoid garbled output as long as the encodings used for both the write and the read operations are consistent.

Wrong‑in, Wrong‑out Example

Consider the command: INSERT INTO table VALUE("啊"); If the terminal uses GBK, the byte sequence for “啊” is 10110000 10100001. MySQL will decode it according to character_set_client:

If character_set_client is GBK, MySQL correctly interprets the character.

If it is Latin‑1, MySQL treats the two bytes as separate Latin‑1 characters, resulting in “°¡”.

If it is UTF‑8, the byte sequence is invalid, leading to an error or replacement character “�”.

Therefore, to achieve “wrong‑in, wrong‑out”, character_set_client should be set to Latin‑1, and the target table must be encoded as Latin‑1 or UTF‑8.

Reading Data

When reading, MySQL converts the stored data to the charset defined by character_set_results. If you wrote data using Latin‑1, you must also set character_set_results to Latin‑1 to retrieve it correctly.

Practical Example

Assume a student table where the name column is defined as Latin‑1 but stores GBK‑encoded data. From a UTF‑8 terminal, a direct query: SELECT * FROM student WHERE name = "小明"; fails with error 1267 because MySQL tries to convert the UTF‑8 literal to Latin‑1, which is impossible.

Changing the client charset with: SET NAMES latin1; makes MySQL treat the input as Latin‑1, but the query still returns no rows because the stored bytes are GBK, not Latin‑1.

The workaround is to convert the query to GBK before sending it and then convert the result back to UTF‑8:

echo "SET names latin1; SELECT * FROM student WHERE name = '小明';" \
| iconv -f utf8 -t gbk \
| mysql -uroot -p123 -Dtest \
| iconv -f gbk -t utf8

This yields the correct row with the name displayed properly.

If the table itself uses GBK, setting the appropriate character_set_client and character_set_results is sufficient; no extra conversion steps are needed.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Encoding MySQL UTF-8 Character Set GBK

Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.