Databases 9 min read

Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Solves It

This article explains why MySQL’s built‑in utf8 charset only supports three‑byte characters, why emojis cause errors, and how switching to the proper utf8mb4 charset resolves insertion issues while preserving full Unicode support.

Efficient Ops
Efficient Ops
Efficient Ops
Why MySQL’s “utf8” Isn’t Real UTF‑8 and How utf8mb4 Solves It

Error Review

Inserting an emoji directly into a MySQL table using the default

utf8

charset caused the error:

<code>INSERT INTO `csjdemo`.`student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`)
   VALUES ('20', '陈哈哈😓', '男', '20', '181班', '9年级', '看电影');</code>

Result:

<code>[Err] 1366 - Incorrect string value: '\xF0\x9F\x98\x93' for column 'NAME' at row 1</code>

After changing the database, system, and column collations to

utf8mb4

, the insert succeeds:

<code>INSERT INTO `student` (`ID`, `NAME`, `SEX`, `AGE`, `CLASS`, `GRADE`, `HOBBY`)
   VALUES (null, '陈哈哈😓😓', '男', '20', '181班', '9年级', '看电影');</code>

Fun Facts About MySQL utf8

MySQL’s utf8 is not true UTF‑8; it only supports up to three bytes per character, while real UTF‑8 supports up to four bytes.

Chinese characters occupy three bytes, ASCII characters one byte, but emojis require four bytes, causing insertion failures unless

utf8mb4

is used.

The comparison image shows how character count and byte size change after converting to

utf8mb4

.

MySQL introduced

utf8mb4

in 2010 to work around this limitation, but never officially announced it, leading many developers to mistakenly use

utf8

as if it were full UTF‑8.

utf8mb4 Is the Real UTF‑8

Only

utf8mb4

implements the full Unicode range. The older

utf8

charset is a limited, MySQL‑specific encoding.

All MySQL and MariaDB users should migrate to

utf8mb4

and stop using

utf8

.

A Brief History of utf8 in MySQL

MySQL added UTF‑8 support in version 4.1 (2003), but at that time the UTF‑8 standard (RFC 3629) allowing four‑byte characters had not yet been adopted.

Earlier RFC 2279 allowed up to six bytes per character; MySQL initially used this version, limiting

utf8

to three‑byte sequences in a 2002 update.

The change was likely motivated by a desire to improve performance by using fixed‑length

CHAR

columns, but it introduced the incompatibility with true UTF‑8.

Because the broken charset was already released, MySQL could not simply fix it without forcing users to rebuild databases, so it kept the limitation until the 2010 introduction of

utf8mb4

.

Conclusion

Most online articles still treat MySQL’s

utf8

as real UTF‑8, leading to widespread errors when storing emojis or other four‑byte characters. When setting up MySQL or MariaDB databases, always configure the server, database, tables, and columns to use

utf8mb4

to ensure full Unicode compatibility.

emojidatabaseencodingMySQLcharacter setutf8mb4
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.