Databases 10 min read

Hot and Cold Data Separation: Concepts, Scenarios, and Implementation Methods

The article explains the hot‑cold data separation pattern, describing its purpose, when to use it, how to distinguish hot versus cold data, and three practical implementation approaches—code modification, binlog listening, and scheduled scanning—to improve performance and maintain data consistency in large‑scale systems.

Code Ape Tech Column
Code Ape Tech Column
Code Ape Tech Column
Hot and Cold Data Separation: Concepts, Scenarios, and Implementation Methods

Regardless of how complex a business scenario is, the entire lifecycle of a piece of data is reflected in its CRUD operations—Create, Read, Update, Delete. Like the life cycle of a person, a data record’s value diminishes over time.

The value of data lies in how often it is used; different systems have different requirements for data from different periods.

For example, on platforms such as 12306 and Ctrip, users usually care only about orders within the last 30 days, and Ctrip keeps only 30‑day order information by default, requiring a phone‑number lookup for older orders.

Why does Ctrip do this?

If all billions of yearly orders were fully CRUD‑able, the system would collapse instantly.

An order reaches its final state when it is completed, meaning it only needs to be read thereafter.

Ctrip’s architecture adopts a hot‑cold separation strategy.

What Is Hot‑Cold Separation?

Hot‑cold separation divides a database into a hot store and a cold store . The hot store holds data that still requires updates, while the cold store holds data that has reached its final state.

For instance, orders within 30 days may need refunds or invoice issuance (updates), whereas orders older than 30 days are only queried.

This introduces two concepts:

Hot data : frequently updated, with response‑time requirements.

Cold data : rarely or never updated, occasional reads, no strict response‑time constraints.

When Should Hot‑Cold Separation Be Used?

In large‑scale internet systems, consider hot‑cold separation when:

Main business response latency is too high (e.g., slow order placement on 12306).

Data has reached a terminal state with no further update needs, only read requirements.

Users can tolerate separate queries for new and old data (e.g., Ctrip’s phone‑number lookup for orders older than 30 days).

Note: Even if a system does not expose separate queries to users, it may still perform hot‑cold separation internally.

How to Determine Whether Data Is Hot or Cold?

Typically, you classify data based on business‑specific fields such as order time (time dimension) or order status (status dimension). For example, data older than three months can be marked as cold, while recent data remains hot. You can also combine dimensions—for instance, orders older than three months *and* already completed are cold.

In short: Analyze according to your own business requirements.

Two important points to remember:

If data is marked as cold, the application should no longer perform write operations on it.

Do not have simultaneous read‑write requirements for the same piece of data across hot and cold stores.

How to Implement Hot‑Cold Data Separation?

After understanding the theory, here are three common implementation methods.

1. Modify Business Code

This approach directly changes the business code, which is highly invasive and cannot separate data by time during modification; the separation is triggered when data changes.

When an order’s status becomes final, the code marks it as cold data , writes it to the cold store, and deletes it from the hot store.

2. Listen to Database Binlog

This method watches the binlog to trigger separation, e.g., when an order status changes.

It cannot separate by time, but it is non‑intrusive to the business code.

Tools such as Alibaba’s Canal or other open‑source middleware can be used. For MySQL, Canal is recommended. See the linked article for a Spring Boot integration example.

Full workflow diagram:

3. Scheduled Scanning Tasks

This approach uses time‑based criteria, decouples from business code, and is a solid choice.

Typical flow diagram:

Summary

Hot‑cold separation is an effective way to address read/write performance bottlenecks. The three methods above can all achieve the goal, but you must still consider many issues, especially data consistency between hot and cold stores.

Ensuring consistency is critical; many techniques exist, but they are beyond the scope of this article.

Final Note (Please Follow)

The author has published three series of articles compiled into PDFs. Access methods are provided via the “Code Monkey Technical Column” public account.

Spring Cloud Advanced – PDF: Follow the public account and reply with the keyword “Spring Cloud Advanced”.

Spring Boot Advanced – PDF: Follow the public account and reply with the keyword “Spring Boot Advanced”.

MyBatis Advanced – PDF: Follow the public account and reply with the keyword “MyBatis Advanced”.

If this article helped you, please like, view, share, and bookmark—it’s the biggest motivation for the author to keep writing.

Follow the public account “Code Monkey Technical Column” for more fan benefits; reply with “Join Group” to get into the technical discussion group.

Backendperformance optimizationCold Datadata lifecyclehot datadatabase partitioning
Code Ape Tech Column
Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.