Understanding MySQL Character Sets, Collations, and Common Encoding Issues
This article explains MySQL's various character sets and collations, describes important system variables such as character_set_client and character_set_connection, demonstrates how to query supported encodings, and provides practical examples for avoiding and fixing garbled text caused by mismatched encodings.
The author, a middleware developer at iKangSheng, expands on previous posts about MySQL character sets by detailing the fundamentals of different encodings (ASCII, Latin1, UCS‑2, UTF‑16, UTF‑32, GB series, UTF‑8) and how they occupy 1‑4 bytes per character.
1. Character Sets
ASCII uses 1 byte per character, with the high bit always zero.
Latin1 extends ASCII to cover Western European letters.
UCS‑2 uses 2 bytes (65,536 code points).
UTF‑16 builds on UCS‑2, allowing surrogate pairs for additional characters.
UTF‑32 uses a fixed 4‑byte length.
GB series and UTF‑8 are variable‑length encodings (1‑4 bytes).
When only ASCII characters (0‑127) are expressed, Latin1, GB series, and UTF‑8 behave identically, while UTF‑16/UTF‑32 add extra padding.
mysql> select
hex(convert("sql" using latin1)) as a,
hex(convert("sql" using gbk)) as b,
hex(convert("sql" using utf8mb4)) as c,
hex(convert("sql" using utf16)) as d,
hex(convert("sql" using utf32)) as e;
| a | b | c | d | e |
+--------+--------+--------+--------------+-----------------------+
| 73716C | 73716C | 73716C | 00730071006C | 00000073000000710000006C |You can list all character sets and collations supported by the current MySQL instance with:
mysql> SELECT CHARACTER_SET_NAME, COLLATION_NAME, ID FROM INFORMATION_SCHEMA.COLLATIONS
ORDER BY CHARACTER_SET_NAME, ID;What is a collation?
A collation is a set of rules that defines how to compare and sort character strings.
2. MySQL System Variables
Run show variables like 'character%' to see the relevant parameters:
character_set_client
character_set_connection
character_set_results
character_set_server
-- The following three are rarely needed
character_set_filesystem
character_set_database
character_set_system2.1 character_set_client
It is a server‑side variable, not the client’s actual encoding.
The client sends data in whatever encoding the user typed; MySQL assumes that data is encoded with the value of character_set_client .
It cannot be set to multibyte encodings such as ucs2, utf16, or utf32.
ucs2
utf16 / utf16le
utf322.2 character_set_connection
According to the MySQL documentation, after receiving a statement the server converts it from character_set_client to character_set_connection , except for literals that specify their own charset (e.g., _utf8mb4"text").
2.3 SET NAMES
The SET NAMES statement sets three variables at once: character_set_client , character_set_connection , and character_set_results . It does not affect character_set_server .
mysql> SET NAMES gb2312;3. A Common Garbled‑Text Mistake
A previous article claimed that mismatched client and table encodings cause garbled output, but even when both are set to gb2312 the problem can appear because the data sent to MySQL is actually encoded in UTF‑8. MySQL interprets the UTF‑8 bytes as gb2312, leading to corruption.
# Terminal encoding is UTF‑8; convert to GB2312 before sending
$ echo 'insert into ytt_new10.t1 val("病毒滚吧!");' | iconv -f utf8 -t gb2312 > insert-gb.txt
$ mysql -h ... --default-character-set=gb2312 < insert-gb.txtWhen selecting data, MySQL converts it to character_set_results before sending it back; the client may need to convert it back to UTF‑8 for proper display.
$ mysql ... --default-character-set=utf8 -e 'select * from ytt_new10.t1'
病毒滚吧!
$ mysql ... --default-character-set=gb2312 -e 'select * ...' | iconv -f gb2312 -t utf8
病毒滚吧!4. Why Convert from client to connection?
After receiving a statement, the server converts it from character_set_client to character_set_connection .
This extra layer allows the server to apply charset‑specific behavior (e.g., length calculations) that differs between encodings. Some encodings like UTF‑32 cannot be used directly by the client, so the conversion step enables those behaviors.
mysql> set character_set_connection='utf8';
mysql> select length("hello");
| 5 |
mysql> set character_set_connection='utf32';
mysql> select length("hello");
| 20 |Thus, the conversion from client to connection is essential for supporting multibyte and special‑purpose character sets in MySQL.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.