Fundamentals 4 min read

Why Do You See “锟斤拷” in Text? Uncover the Encoding Mystery

This article explains how character encoding works, using ASCII, Unicode, UTF‑8 and GBK examples to reveal why the garbled string “锟斤拷” appears when mismatched encodings are processed, and shows the underlying byte‑level transformations.

macrozheng
macrozheng
macrozheng
Why Do You See “锟斤拷” in Text? Uncover the Encoding Mystery

What is the mysterious “锟斤拷”?

In computing, every character is represented by a binary code. The article explains that encoding is simply a mapping from symbols to binary numbers.

ASCII example

For instance, the ASCII code

0100 0001

(decimal 65) corresponds to the letter

A

.

The Unicode replacement character

(U+FFFD, 65533) is used when a decoder encounters an unknown byte sequence.

Why “锟斤拷” appears

When a UTF‑8 byte array such as

new byte[] {-25, -119, -25, -116}

cannot be decoded, the decoder substitutes the replacement character, which is displayed as “�”.

In GBK, the same six‑byte sequence

0xEFBFBDEFBFBD

is split into three two‑byte characters: 0xEFBF, 0xBDEF, 0xBFBD, which correspond to the Chinese characters “锟”, “斤”, and “拷”.

Thus the garbled “锟斤拷” you often see is the result of mismatched encoding between UTF‑8 and GBK.

Now you know the reason behind those strange symbols.

Software DevelopmentUnicodeUTF-8character encodingASCIIGBK
macrozheng
Written by

macrozheng

Dedicated to Java tech sharing and dissecting top open-source projects. Topics include Spring Boot, Spring Cloud, Docker, Kubernetes and more. Author’s GitHub project “mall” has 50K+ stars.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.