Databases 11 min read

DingoDB Multi-Modal Vector Database: Design Philosophy, Architecture and Applications

DingoDB is a multi‑modal vector database that unifies storage and analysis of structured, semi‑structured and unstructured data through a Raft‑based distributed architecture, offering MySQL‑compatible SQL, high‑performance APIs, automatic sharding, real‑time index optimization, and hybrid scalar‑vector queries for enterprise knowledge bases, LLM memory, and real‑time decision‑making.

Sohu Tech Products
Sohu Tech Products
Sohu Tech Products
DingoDB Multi-Modal Vector Database: Design Philosophy, Architecture and Applications

This article introduces DingoDB, a multi-modal vector database, from a technical perspective, along with its application scenarios.

1. Design Philosophy

Before 2015, data architecture was dominated by Data Warehouse, focusing on unified storage of structured data. Between 2016-2022, the Data Lake concept became popular, expanding data dimensions and managing more data types. Recently, with the rapid development of AIGC, the data ecosystem has evolved into a new era. As data analysis complexity continues to increase, requirements have expanded from initial query processing to machine learning and deep learning, and now to self-service analysis, generative AI content creation (AIGC), and automated machine learning platforms (AutoML, GPT).

In the new "Vector Ocean" era, the original data processing workflow remains largely unchanged, including data source, acquisition, transformation, storage, computation, and utilization for analysis, prediction, and application building. The future trend moves toward "Vector Ocean" where all unstructured data will eventually be converted to vectors for storage. Based on data structure, real-time analysis and processing workflows are built, upon which various data applications are constructed.

DingoDB is a multi-modal vector database designed to handle data storage and computation, along with analysis and prediction capabilities. It aims to create a database that integrates storage, analysis, and querying of both structured and unstructured data, meeting users' vector query needs while protecting and utilizing existing data.

2. Product Advantages and Architecture

Overview: DingoDB supports storage of structured, semi-structured, and unstructured data, providing MySQL-compatible protocols and optimizers. The database底层 supports key-value (KV) and vector storage, using distributed storage architecture to achieve unified storage and analysis of multi-modal data. Users can access data through SQL instructions and APIs, with server-side computation support.

As the first vector database certified by the China Academy of Information and Communications Technology (CAICT), DingoDB completed testing with outstanding product capabilities, passing 39 test items including 27 mandatory items, becoming the vector database with the most passed items among evaluated vendors. DingoDB has also become the officially supported backend storage for Langchain projects.

Key Features:

Storage: Based on industrial-grade Raft protocol for multi-replica strategy, ensuring strong data consistency and security.

SQL Processing: Provides unified SQL processing capabilities, supporting MySQL protocol and index management, with monitoring and decomposition-based fusion analysis.

API Support: Supports multiple high-performance API interfaces for high-frequency business needs like decision-making.

Data Analysis: Through Python SDK, supports hybrid analysis of multi-modal data, including hybrid retrieval of vector and scalar indexes, with compatibility for multiple processors and operator pushdown.

Architecture:

Application Layer: Supports various scenarios including traditional relational database analysis, semantic search, structured and decomposed data analysis, real-time data decision support, prompt management, and LLM memory.

Protocol Layer: Provides MySQL-compatible SQL support, high-performance Serving API, and native vector API support. Three access entry points: SQL entry supporting MySQL clients and JDBC Driver; high-performance Java SDK for real-time access; Python/C++ SDK for LLM requirements.

Computation Layer: Executor handles distributed transaction and query optimization; Coordinator manages metadata and resource management.

Storage Layer: Supports relational tables and vector tables, with capability to connect to other storage types like object storage or distributed file systems.

Product Advantages:

Comprehensive access interfaces: SQL, SDK, API with table and vector as first-class citizen data models.

Built-in data high availability: All features and HA are built-in without external components, reducing deployment and operational costs.

Fully automated elastic data sharding: Supports dynamic configuration of data shard sizes with automatic classification and merging based on user-defined thresholds.

Scalar and vector joint queries: Supports traditional index types and mainstream vector index types, seamlessly connecting scalar and vector hybrid retrieval.

Built-in real-time index construction optimization: Automatically rebuilds indexes based on data scale changes and compute resource configuration.

3. Application Scenarios

DingoDB is applied in multiple scenarios including enterprise knowledge base construction, LLM memory, real-time decision-making metric analysis, and supporting the VectorOcean data support platform.

In empowering LLM applications, DingoDB is divided into four different layers, covering the entire process from data to Embedding Model. This includes using industry public or self-developed large models and Embedding models for data vectorization, using DingoDB at the vector storage layer to support various scenarios, and then connecting to large language models to implement applications.

Knowledge Butler is a new application direction in the LLM era based on vector databases and large models. DingoDB provides strong support for Knowledge Butler, which is mainly responsible for enterprise knowledge management and creation, with application scenarios including intelligent Q&A, content creation assistants, intelligent workflows, and enterprise decision support.

vector databaseVector SearchDistributed StorageLLM applicationsdata-architectureDingoDBmulti-modal database
Sohu Tech Products
Written by

Sohu Tech Products

A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.