Databases 7 min read

Storing 2 Billion Phone Numbers: Int vs String, Schema Design and Pitfalls

When designing a schema for two‑billion phone numbers, use a VARCHAR(20) column with utf8mb4 Unicode, a unique index, and proper validation/encryption rather than a 32‑bit INT, because strings preserve leading zeros, international prefixes, extensions, and support business extensibility, fault tolerance, and future changes.

Java Tech Enthusiast
Java Tech Enthusiast
Java Tech Enthusiast
Storing 2 Billion Phone Numbers: Int vs String, Schema Design and Pitfalls

In an interview scenario a candidate is asked how to store 2 billion phone numbers and whether to use INT or STRING, VARCHAR or CHAR.

Phone numbers are 11‑digit numeric strings; an INT (32‑bit) maxes at 2,147,483,647, which cannot hold 11‑digit numbers. A 64‑bit LONG (BIGINT) can store them, but it loses leading zeros, cannot represent international prefixes (+86) or hyphens, and makes range queries inefficient.

Long phoneNumber = 01324567890L; // compile error – leading zero not allowed

Using a character type (VARCHAR) preserves the exact representation, supports optional country codes, hyphens, extensions, and enables flexible pattern matching.

CREATE TABLE user_tab (
  id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT COMMENT '用户ID',
  phone_number VARCHAR(20) NOT NULL COMMENT '手机号',
  PRIMARY KEY (id),
  UNIQUE KEY idx_phone (phone_number)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci COMMENT='用户表';

The interview expects the candidate to discuss three hidden assessment points: business extensibility, data fault‑tolerance, and comprehensive problem‑solving.

Choosing VARCHAR(20) instead of VARCHAR(11) leaves room for international numbers (+8613822223333), extensions (13822223333#123), and future business changes.

Typical pitfalls in daily development include:

Using a field length that is too short (VARCHAR(11) truncates longer numbers).

Wrong character set/collation – use utf8mb4 and utf8mb4_unicode_ci to store all Unicode symbols.

Missing unique index on phone numbers, leading to duplicates.

Skipping data cleaning – remove spaces, hyphens, keep only + and digits.

Storing phone numbers in plain text – apply encryption or masking.

// Strict validation (11‑digit pure numbers)
String regex = "^1(3[0-9]|4[579]|5[0-35-9]|6[2567]|7[0-8]|8[0-9]|9[0-35-9])\\d{8}$";
// Loose validation (allows +86 prefix)
String looseRegex = "^(\\+\\d{1,3})?1(3\\d|4[579]|5[0-35-9]|6[2567]|7[0-8]|8\\d|9[0-35-9])\\d{8}$";

Overall, a reasonable column length equals the basic requirement plus international extension plus a tolerance buffer.

SQLBIGINTdata type selectionDatabase DesignInterviewphone number storageVARCHAR
Java Tech Enthusiast
Written by

Java Tech Enthusiast

Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.