Cloud Native 10 min read

Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

To empower enterprise AI in the LLM era, Baidu Cloud unveils a cloud‑native data platform featuring upgraded databases—PegaDB, GaiaDB 5.0, Vector DB 2.0, Palo 2.0—and integrated services like DBSC 2.0, EDAP 2.0, and DBStack, delivering high‑performance, cost‑effective handling of structured, unstructured, and vector data for fine‑tuning and Enterprise RAG.

Baidu Tech Salon
Baidu Tech Salon
Baidu Tech Salon
Baidu Cloud Native Data Platform: Empowering Enterprise AI in the LLM Era

This article is adapted from a presentation at Baidu Cloud Smart Summit 2024 - Cloud Native Forum, discussing how enterprises can build effective AI capabilities in the large language model (LLM) era.

Large models are trained primarily on publicly available internet data, lacking enterprise-specific internal data and industry knowledge. This limits their ability to handle real enterprise business problems. The industry has two main approaches to solve this: fine-tuning and Enterprise RAG (Retrieval Augmented Generation), enabling LLMs to leverage enterprise data combined with general models to achieve enterprise intelligence.

Building good enterprise intelligence faces several challenges: (1) Enterprise data exists in various storage systems including structured and unstructured data, requiring processing including collection, cleaning, transformation, and annotation before being usable by LLMs or vector databases; (2) As businesses grow, data scales simultaneously, requiring higher performance and cost-efficiency from data platforms; (3) LLM business requires high agility, making platform usability critical.

Baidu Cloud introduced significant updates to their database and big data products:

PegaDB : A self-developed KV database competing with open-source Redis, with features including batch loading, cross-region multi-active capability, and cold-hot data separation. It offers 30-50% lower pricing than competitors, supporting over 1 million QPS with p999 latency under 10ms.

GaiaDB 5.0 : The latest cloud-native relational database version with HTAP capabilities, supporting both columnar indexes and columnar engines. It introduces compute node scale-out for distributed cloud-native integration, and offers Serverless version achieving over 50% compute and 80% storage resource savings.

Vector Database 2.0 : Self-developed vector database for RAG scenarios, with 2.35x improved memory utilization and 7x performance improvement over open-source alternatives. Includes AI Search SDK for quick knowledge base application development.

Palo 2.0 : Real-time analysis database based on Apache Doris, with TPC-DS performance improved over 10x and storage cost reduced by over 80% through cold-hot separation.

Platform capabilities include:

DBSC 2.0 : One-stop database DevOps platform supporting development, management, and security auditing for over 10 database engines including MySQL, GaiaDB, Redis, and openGauss.

EDAP 2.0 : Lakehouse integration platform providing comprehensive data lake capabilities with full support for unstructured data governance, deep integration with AI platforms, lakehouse management supporting Iceberg, Hudi, and DeltaLake formats, and end-to-end Serverless computing.

DBStack : Database privatization platform supporting multi-cloud and hybrid cloud deployment, providing complete database capabilities including open-source, cloud-native, and commercial databases.

RAGvector databaseenterprise AIData Lakehousecloud-native databaseGaiaDBPaloDBStackEDAPPegaDB
Baidu Tech Salon
Written by

Baidu Tech Salon

Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.