Understanding MySQL Unicode Support: utf8mb3 vs utf8mb4 and How to Migrate
This article explains MySQL's Unicode character set support, compares the deprecated utf8mb3 with the modern utf8mb4, and provides step‑by‑step SQL commands for safely converting existing databases to the newer charset.
MySQL is a relational database that requires a character set to be specified when creating a database; many users default to UTF‑8, which actually maps to the older utf8mb3 charset and can cause hidden problems.
Unicode is the industry standard for representing most world scripts, and its transformation formats (UTF‑8, UTF‑16, UTF‑32, etc.) are supported by MySQL through several character sets: utf8 , ucs2 , utf8mb3 , utf8mb4 , utf16 , utf16le , and utf32 . The differences lie in the range of characters they can store and the bytes required per character.
Character Set
Supported Characters
Bytes per Character
utf8mb3, utf8
BMP
1‑3 bytes
ucs2
BMP
2 bytes
utf8mb4
BMP + Supplementary
1‑4 bytes
utf16
BMP + Supplementary
2 or 4 bytes
utf16le
BMP + Supplementary
2 or 4 bytes
utf32
BMP + Supplementary
4 bytes
MySQL documentation warns that the utf8mb3 charset is deprecated and will be removed in future versions; developers should explicitly use utf8mb4 instead of utf8 to avoid ambiguity.
The utf8mb3 charset only supports BMP characters (code points U+0000‑U+FFFF) and requires up to three bytes per character, meaning it cannot store rare Chinese characters or emoji.
In contrast, utf8mb4 supports both BMP and supplementary characters (code points up to U+10FFFF) and uses up to four bytes per character, making it a superset of utf8mb3 . For compatibility and future‑proofing, using utf8mb4 is strongly recommended.
Differences and Pros/Cons
utf8mb3 supports only BMP characters and uses up to 3 bytes per character; utf8mb4 supports BMP and supplementary characters and uses up to 4 bytes, which may increase storage but provides broader language support.
Because utf8mb4 can store more characters, many projects choose it despite the slightly larger storage footprint.
Migrating from utf8mb3 to utf8mb4
The migration is straightforward. For tables that currently use utf8 (alias for utf8mb3 ), you can alter the default charset and each column’s charset and collation.
CREATE TABLE t1 (
col1 CHAR(10) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
col2 CHAR(10) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL
) CHARACTER SET utf8;To convert the table to utf8mb4 :
ALTER TABLE t1
DEFAULT CHARACTER SET utf8mb4,
MODIFY col1 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
MODIFY col2 CHAR(10) CHARACTER SET utf8mb4 COLLATE utf8mb4_bin NOT NULL;After running the ALTER TABLE statements, the table will store data using the modern charset without losing existing BMP data.
In summary, understanding MySQL's Unicode support and proactively switching to utf8mb4 prevents future compatibility issues and enables full Unicode coverage, including emojis and rare characters.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.