How ByteHouse Boosted Douyin’s Interest Circle Queries by 100×
This article explains how Douyin rebuilt its interest‑circle platform by replacing MySQL with the columnar OLAP engine ByteHouse, achieving roughly a hundred‑fold improvement in query speed, lower hardware costs, and seamless horizontal scalability for massive daily data volumes.
Background
Douyin’s recommendation system groups users with similar preferences into "interest circles" to deliver precise video suggestions. Building such a platform requires both sophisticated algorithms and a storage architecture that can handle massive daily data growth and real‑time complex queries.
Challenges with MySQL
Using MySQL as the underlying row‑store caused several problems:
Low query efficiency for billions of rows, requiring expensive hardware or sharding.
Frequent data‑volume explosions (daily increases of tens of thousands to millions of rows) pushed MySQL to its performance limits.
Complex multi‑column conditions and large‑scale joins often resulted in full‑table scans exceeding 10 seconds.
Increasing business tags and feature dimensions made schema management cumbersome.
Why ByteHouse?
ByteHouse is a distributed columnar OLAP engine with high query performance, strong data compression, and horizontal scalability. It suits append‑only, analytical workloads and reduces hardware requirements.
Architecture Overview
Data from Hive tables are periodically imported into an RDS (MySQL) instance, then cached or written back for downstream use. User queries trigger the backend to read from cache, the database, or specific partitions, assemble the results, and return them.
Comparison: MySQL vs ByteHouse
Key differences:
MySQL: relational, row‑store, supports transactions (OLTP).
ByteHouse: distributed columnar store, supports eventual transactions (OLAP).
Row storage is optimal for reading few rows; column storage excels at reading large batches.
ByteHouse leverages multi‑core parallelism, while MySQL often runs single‑threaded per query.
Performance Tests
1. Storage Refactor Benchmarks
Circle‑layer info query (low‑frequency single‑point): MySQL 7444 ms vs ByteHouse 71 ms.
Author‑layer info query: MySQL 6840 ms vs ByteHouse 121 ms.
Nearby circle query: MySQL 3840 ms vs ByteHouse 57 ms.
<code>select circle_layer_id, name, description, top_tag, core_author_list
from circle_layer
where (date = '12345678' and model = 1 and author_type = 1)
and circle_layer_id in(71,72,73,74,75,76,77,78)</code>2. Application‑Level Queries
Fetch all user IDs and names in a circle: MySQL 30140 ms vs ByteHouse 162 ms.
Filter by user ID, relation type, and author name pattern: MySQL 2524 ms vs ByteHouse 102 ms.
<code>SELECT DISTINCT fans_id, fans_name
FROM circle_layer_user
WHERE (date = '12345678' AND model = 1 AND author_type = 1)</code> <code>SELECT *
FROM circle_layer_user
WHERE (date = '12345678' AND model = 2 AND author_type = 5)
AND fans_id = 098765432123450
AND relation_type = 2
AND author_name LIKE '%xx%'</code>Conclusion
After migrating the interest‑circle platform to ByteHouse, typical query scenarios saw an average speedup of about 100×, dramatically improving user experience. The columnar engine’s superior query performance and compression allow moderate‑spec servers to meet demand, reducing overall hardware cost, while its horizontal scalability ensures future growth can be handled by simply adding more nodes.
ByteDance Data Platform
The ByteDance Data Platform team empowers all ByteDance business lines by lowering data‑application barriers, aiming to build data‑driven intelligent enterprises, enable digital transformation across industries, and create greater social value. Internally it supports most ByteDance units; externally it delivers data‑intelligence products under the Volcano Engine brand to enterprise customers.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.