Databases 8 min read

ByteHouse Vector Search Technical Guide: Architecture, Design, and Performance Optimizations

This guide explains ByteHouse’s high‑performance vector search capabilities, covering the background of vector retrieval for LLMs, the limitations of its existing skip‑index architecture, the new vector‑index design with HNSW and IVF, query‑time optimizations, performance benchmarks against Milvus, and future development plans.

DataFunTalk

May 9, 2024

ByteHouse Vector Search Technical Guide: Architecture, Design, and Performance Optimizations

With the rapid adoption of large language models (LLMs), databases need to enhance vector analysis and AI support. Vector search and vector databases act as external memory for LLMs, providing related content to improve answer accuracy.

ByteHouse, a cloud‑native data warehouse from Volcano Engine, has introduced high‑performance vector search. The article examines how an OLAP engine can build such capability, noting that vector retrieval is also used in OLAP for unstructured data analysis.

Typical vector search workloads involve datasets ranging from millions to billions of vectors with latency requirements of a few to hundreds of milliseconds. Consequently, brute‑force methods are impractical; instead, specialized vector indexes such as HNSW and Faiss IVF are employed.

The existing skip‑index architecture in ByteHouse cannot efficiently support vector search because it lacks cache mechanisms for vector indexes, performs redundant distance calculations after mark‑level filtering, and incurs extra I/O when multiple skip indexes exist for large data parts.

To address these issues, ByteHouse introduces a new architecture that integrates popular libraries (hnswlib, Faiss) supporting HNSW, IVF_PQ, and IVF_PQ_FS indexes, adds a vector‑index cache to keep indexes resident in memory, and provides a single‑index build statement per data part. Resource‑aware CPU throttling and on‑disk training for large IVF indexes are also added.

On the query side, a pattern‑recognition and rewrite layer detects order by L2Distance/cosineDistance limit topK queries and replaces them with a new SelectWithSearch operator that performs vector retrieval and attribute fetching in one step. The new pipeline eliminates the previous skip‑index based flow.

Three key optimizations are applied:

Vector‑column read elimination – if a vector column is only needed for the search, it is omitted from the final disk read.

Pre‑compute top‑K across all data parts before reading other attributes, reducing the total rows read and improving latency by over 2× in tests.

Cache preload – newly created or restarted vector indexes are automatically loaded into memory, with table‑level and global settings to control the behavior.

Performance evaluation using VectorDBBench shows ByteHouse achieving higher QPS than Milvus at comparable recall, and faster data insertion times.

Future work includes reducing resource consumption of vector indexes, further query‑performance tuning, improving usability, and deeper integration with the large‑model ecosystem.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

indexing LLM Vector Database ByteHouse

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.