How Computers Display Characters: Encoding, Input Methods, Unicode, and Fonts
This article explains how characters are turned into binary, mapped through Unicode and character sets, processed by input methods and font files, and finally rendered on screen, while also covering the challenges of rare characters and recent Unicode updates.
All data on a computer is represented in binary (0 and 1). To show characters, the computer first converts the typed character into a Unicode code point, which is a binary value defined by a character encoding standard such as Unicode or GBK.
After obtaining the Unicode value, the system looks up the font file’s Charmap to translate the code point into a glyph index, loads the corresponding glyph image, renders it, and finally displays it on the monitor.
Three conditions are required for a character to appear: the input method must support the character, Unicode must contain a code for it, and the installed fonts must include a glyph for that code.
Because Chinese characters number in the tens of thousands, keyboards cannot have a key for each one, so input methods use encoding schemes (e.g., GBK) to map a sequence of keystrokes to a character. Most Chinese input methods today rely on GBK, which covers about 21,000 characters, leaving many rare characters (such as the "biáng" in "Biángbiáng noodles") unavailable.
Some input methods adopt larger character sets like Unicode, allowing entry of rarer characters. Unicode is continuously updated; the latest release, Unicode 13.0 (March 10 2020), added 5,930 characters, bringing the total to 143,859, and includes the "biáng" character (code points 30EDD–30EDE) in the CJK Extension G block.
Even when Unicode contains a character, it will not display unless the operating system’s font files contain a glyph for it. Commercial fonts that support extended CJK characters can render these rare symbols, but many systems still lack them, causing issues in various services (e.g., identity verification, ticketing, banking) when a name contains such characters.
Therefore, using obscure characters requires caution, as support depends on the combination of input method, Unicode version, operating system updates, and font availability.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.