Analysis of MySQL Connector/J Character Set Handling and UTF8MB4 Support
This article examines how MySQL Connector/J determines the character set during connection initialization, explains the transition from UTF8MB3 to UTF8MB4, analyzes source code of versions 5.1.46 and 5.1.47, and provides practical methods to enable UTF8MB4 without upgrading the driver.
Introduction
Since MySQL 4.1, UTF‑8 support was introduced using the RFC 2279 definition (1‑6 bytes per character). In September 2002 the source was changed to a maximum of 3 bytes, creating UTF8MB3 , which cannot store SMP characters. Later, based on RFC 3629, MySQL added UTF8MB4 . Simply setting a table or column to UTF8MB4 is insufficient; the character set specified in SET NAMES during connection also influences SMP character transmission.
Connector Source Analysis
The following analysis is based on the mysql-connector-java dependency versions 5.1.46 and 5.1.47 . Other versions may differ.
When a Connector instance is created, a Set called UTF8MB4_INDEXES is built by checking whether the static collection UTF8MB4_INDEXES contains the variable serverCharsetIndex . This determines the boolean useutf8mb4 , which later decides whether SET NAMES uses UTF8MB4 or UTF8 .
mysql-connector-java 5.1.46
Key class CharsetMapping initializes a static array of Collation objects. The relevant fragment is:
public static final int MAP_SIZE = 2048;
// ...
Collation[] collation = new Collation[MAP_SIZE];
collation[33] = new Collation(33, "utf8_general_ci", 1, MYSQL_CHARSET_NAME_utf8);
collation[45] = new Collation(45, "utf8mb4_general_ci", 1, MYSQL_CHARSET_NAME_utf8mb4);
collation[46] = new Collation(46, "utf8mb4_bin", 0, MYSQL_CHARSET_NAME_utf8mb4);
collation[255] = new Collation(255, "utf8mb4_0900_ai_ci", 0, "utf8mb4");
Set
tempUTF8MB4Indexes = new HashSet<>();
for (int i = 1; i < MAP_SIZE; i++) {
Collation coll = collation[i] != null ? collation[i] : notUsedCollation;
if (coll.mysqlCharset.charsetName.equals(MYSQL_CHARSET_NAME_utf8mb4)) {
tempUTF8MB4Indexes.add(i);
}
}
UTF8MB4_INDEXES = Collections.unmodifiableSet(tempUTF8MB4Indexes);The Collation class stores index, collation name, priority, and the associated MysqlCharset :
class Collation {
public final int index;
public final String collationName;
public final int priority;
public final MysqlCharset mysqlCharset;
public Collation(int index, String collationName, int priority, String charsetName) {
this.index = index;
this.collationName = collationName;
this.priority = priority;
this.mysqlCharset = CharsetMapping.CHARSET_NAME_TO_CHARSET.get(charsetName);
}
}Handshake (MysqlIO)
When a MySQL connection is created, MysqlIO establishes a TCP link and reads the server’s Greeting packet. The packet contains a byte serverCharsetIndex (e.g., 0xf7 → 33) that maps to a collation in the Collation array.
The index is later used to decide whether useutf8mb4 should be true.
ConnectionImpl.configureClientCharacterSet
This method evaluates the Java encoding, MySQL version, and the UTF8MB4_INDEXES set to build the final SET NAMES statement. A simplified excerpt:
private boolean configureClientCharacterSet(boolean dontCheckServerMatch) throws SQLException {
String realJavaEncoding = getEncoding();
boolean characterSetAlreadyConfigured = false;
if (versionMeetsMinimum(4,1,0)) {
characterSetAlreadyConfigured = true;
setUseUnicode(true);
configureCharsetProperties();
realJavaEncoding = getEncoding();
if (getUseUnicode() && realJavaEncoding != null) {
if (realJavaEncoding.equalsIgnoreCase("UTF-8") || realJavaEncoding.equalsIgnoreCase("UTF8")) {
boolean utf8mb4Supported = versionMeetsMinimum(5,5,2);
boolean useutf8mb4 = utf8mb4Supported && UTF8MB4_INDEXES.contains(this.io.serverCharsetIndex);
if (!getUseOldUTF8Behavior()) {
if (dontCheckServerMatch || !characterSetNamesMatches("utf8") || (utf8mb4Supported && !characterSetNamesMatches("utf8mb4"))) {
execSQL(null, "SET NAMES " + (useutf8mb4 ? "utf8mb4" : "utf8"), -1, null, DEFAULT_RESULT_SET_TYPE, DEFAULT_RESULT_SET_CONCURRENCY, false, this.database, null, false);
this.serverVariables.put("character_set_client", useutf8mb4 ? "utf8mb4" : "utf8");
this.serverVariables.put("character_set_connection", useutf8mb4 ? "utf8mb4" : "utf8");
}
}
}
}
}
return true;
}Version 5.1.47 modifies the logic to rely on connectionCollationSuffix and connectionCollationCharset instead of the static UTF8MB4_INDEXES set.
private boolean configureClientCharacterSet(boolean dontCheckServerMatch) throws SQLException {
// ... retrieve connectionCollation and derive suffix/charset ...
String utf8CharsetName = connectionCollationSuffix.length() > 0 ? connectionCollationCharset : (utf8mb4Supported ? "utf8mb4" : "utf8");
if (!getUseOldUTF8Behavior()) {
if (dontCheckServerMatch || !characterSetNamesMatches("utf8") || (utf8mb4Supported && !characterSetNamesMatches("utf8mb4")) || (connectionCollationSuffix.length() > 0 && !getConnectionCollation().equalsIgnoreCase(this.serverVariables.get("collation_server")))) {
execSQL(null, "SET NAMES " + utf8CharsetName + connectionCollationSuffix, -1, null, DEFAULT_RESULT_SET_TYPE, DEFAULT_RESULT_SET_CONCURRENCY, false, this.database, null, false);
this.serverVariables.put("character_set_client", utf8CharsetName);
this.serverVariables.put("character_set_connection", utf8CharsetName);
}
}
return true;
}Conclusion
Upgrading mysql-connector-java to a version that includes the revised logic enables native UTF8MB4 support without additional configuration. If upgrading is not possible, you can force UTF8MB4 by setting the test‑only property com.mysql.jdbc.faultInjection.serverCharsetIndex to a known UTF8MB4 index (e.g., 45, 46, 224) or by changing the MySQL server’s character_set_server to utf8mb4 , which results in a serverCharsetIndex of 255 (hex ff) mapping to the utf8mb4 collation.
This article focuses on the issue “MySQL connection set to UTF‑8 causes SMP character insertion failure”.
政采云技术
ZCY Technology Team (Zero), based in Hangzhou, is a growth-oriented team passionate about technology and craftsmanship. With around 500 members, we are building comprehensive engineering, project management, and talent development systems. We are committed to innovation and creating a cloud service ecosystem for government and enterprise procurement. We look forward to your joining us.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.