Databases 12 min read

Data Processing Technologies in the AI Era: Trends and Integration of Vector and Relational Databases

The talk explores how the rapid growth of multimodal data and large language models is reshaping data processing, highlighting three key trends—online‑offline integration, vector‑relational database convergence, and the fusion of data processing with AI computation—while presenting practical solutions and future visions for unified data‑AI ecosystems.

AntTech

Apr 26, 2024

Data Processing Technologies in the AI Era: Trends and Integration of Vector and Relational Databases

At the OceanBase Developer Conference, Prof. Chen Wanguang from Tsinghua University and Ant Technology Research Institute presented "Data Processing Technologies in the AI Era," emphasizing the explosive growth of data volume, speed, and modalities such as text, audio, images, and video.

He likened data processing to Maslow's hierarchy, progressing from collection and storage to query handling, and finally to AI‑driven processing that can even generate content.

Three major development trends for future databases were identified:

1. Online‑offline integration – unifying real‑time transaction processing with batch analytics to eliminate data inconsistency between separate online and offline pipelines.

2. Vector database and relational database convergence – combining embedding‑based vector search with traditional relational storage to support large‑language‑model services while maintaining a single source of truth.

3. Data processing and AI computation integration – tightly coupling data cleaning, enrichment, and AI model inference in iterative loops, as illustrated by the CCNet workflow for extracting high‑quality web content.

The speaker highlighted challenges such as inconsistent data across online/offline paths, the need for HTAP engines, and the difficulty of synchronizing graph databases (TuGraph DB) with storage systems. A solution using binlog‑based synchronization and a unified query language (ISO‑GQL) was described.

Ant Group's VSAG vector library, built on FAISS, was presented as an optimized, developer‑friendly alternative that integrates vector search into OceanBase via plugins.

To bridge AI and big‑data ecosystems, the talk discussed the limitations of current stacks (Python‑based AI on GPUs vs. Java‑based Spark on CPUs) and the performance gap of PySpark. Optimizations to PySpark were shown to double performance in data deduplication tasks.

Finally, a vision was offered for a unified data‑AI platform that supports both CPU and GPU workloads, enabling "write once, run everywhere" across heterogeneous hardware.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data AI data-processing Vector Database HTAP databases

Written by

AntTech

Technology is the core driver of Ant's future creation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.