Databases 7 min read

Why MySQL’s “utf8” Is Not Real UTF‑8 and You Should Switch to utf8mb4

The article explains that MySQL’s legacy “utf8” charset only supports three‑byte characters, causing errors when storing true four‑byte UTF‑8 symbols, and shows how the newer “utf8mb4” charset provides full Unicode support, with historical context and migration guidance.

Top Architect
Top Architect
Top Architect
Why MySQL’s “utf8” Is Not Real UTF‑8 and You Should Switch to utf8mb4

In a recent encounter, the author tried to store a UTF‑8 string containing an emoji in a MariaDB database configured with the "utf8" charset and received the error:

Incorrect string value: ‘😃 <…>’ for column ‘summary’ at row 1

The root cause is that MySQL’s "utf8" charset is not true UTF‑8; it only supports characters up to three bytes, whereas proper UTF‑8 can use up to four bytes per character.

To address this limitation, MySQL introduced the "utf8mb4" charset in 2010, which implements the full Unicode range and should be used instead of "utf8" for any modern application.

Key takeaways:

MySQL’s "utf8mb4" is the correct implementation of UTF‑8.

The legacy "utf8" charset is a proprietary, limited encoding that cannot store many Unicode characters.

All MySQL and MariaDB users should migrate from "utf8" to "utf8mb4" and stop using the former.

The article also provides a brief primer on encoding: computers store text as binary numbers, mapping characters to Unicode code points, which are then encoded into byte sequences. UTF‑8 is efficient because it uses one byte for common ASCII characters and up to four bytes for less common symbols.

Historically, MySQL adopted an early UTF‑8 standard (RFC 2279) that allowed up to six bytes per character, later restricting it to three bytes for performance reasons. This decision, combined with the use of fixed‑length CHAR columns, led to the flawed "utf8" implementation.

Because fixing the charset would require users to rebuild their databases, MySQL kept the broken "utf8" for years, only providing the proper "utf8mb4" later.

In conclusion, if you are using MySQL or MariaDB, you should convert your databases to "utf8mb4"; a guide is available at https://mathiasbynens.be/notes/mysql-utf8mb4#utf8-to-utf8mb4.

databaseencodingMySQLcharacter setutf8mb4MariaDB
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.