Engineering Practices and Evolution of Douyin’s Cloud‑Native Vector Database
This article outlines Douyin’s step‑by‑step engineering evolution of its cloud‑native vector database, covering the background of vector search, core concepts, algorithmic optimizations, storage‑compute separation, streaming updates, multi‑tenant orchestration, and future applications such as large language model integration.
With the widespread adoption of deep learning, embedding‑based retrieval has become a common requirement, prompting Douyin to develop a dedicated vector database to handle large‑scale unstructured data.
Background of Vector Databases – Traditional inverted‑index methods (BM25, TF‑IDF) struggle with semantic search and multimodal data. Embedding models (doc2vec, BERT, LLMs) transform text, images, and video into vectors, turning retrieval into an approximate nearest‑neighbor problem.
Core Concepts of Vector Retrieval – Similarity is measured by Euclidean distance, inner product, or cosine similarity; results are limited by a configurable top‑K; and performance is balanced against accuracy. Common solutions include ANN algorithms (HNSW, IVF), quantization (PQ, scalar quantization), and low‑level optimizations (SIMD, cache‑friendly memory layout).
Douyin’s Engineering Optimizations
Optimized open‑source HNSW and built a proprietary IVF algorithm, improving performance without sacrificing accuracy.
Developed a custom scalar quantization supporting int16, int8, and int4, enabling retrieval of 200 M vectors on a single T4 GPU.
Implemented SIMD‑based acceleration and memory‑layout tuning for higher throughput.
From Retrieval Algorithms to a Full‑Featured Vector Database – Integrated storage, search, and analytics APIs, ensuring high availability, performance, and ease of use.
Technical Evolution
Hybrid vector‑scalar filtering: post‑filtering (expand top‑K then filter) and pre‑filtering (DSL filter before vector search) to handle structured‑data constraints.
Built a DSL‑directed engine that performs low‑cost DSL filtering alongside vector similarity, supports early termination, and provides execution‑plan optimization.
Storage‑Compute Separation – Shifted from an all‑in‑one architecture to a three‑component design: vector storage cluster, batch index‑building cluster, and online search service. Benefits include reduced index‑build resources, full CPU utilization during batch jobs, and improved online stability.
Streaming Updates – Introduced a dual‑buffer design for concurrent safe index updates (both HNSW and IVF), achieving sub‑second latency for insert/delete/modify operations.
Cloud‑Native Transformation – Migrated to a multi‑tenant, cloud‑native framework with automated scheduling, slot‑based index allocation, and fine‑grained cost monitoring, enabling elastic scaling and resource‑efficient operation.
Application Outlook
Enhance large language models (LLMs) with long‑term memory, domain‑specific knowledge, and real‑time information via vector retrieval.
Mitigate LLM security risks by enforcing per‑user knowledge‑base permissions through vector database access controls.
Douyin’s cloud‑native vector database (VikingDB) now powers numerous internal services and is being offered on Volcano Engine, delivering sub‑10 ms latency for billions of vectors, real‑time updates, and robust multi‑tenant support.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.