Data‑Centric LLM and Knowledge Graph Integration: Fabarta’s AI‑Era Data Infrastructure
The article presents Fabarta’s AI‑era data infrastructure that combines large language models with knowledge graphs and a multimodal database, detailing its data‑centric architecture, HTAP capabilities, cloud‑native design, and real‑world demos that illustrate how graph‑plus‑vector techniques improve model reliability and answer precision.
At the Fabarta Product and User Conference 2023, CTO Yang Chenghu introduced the theme "AI Era Data Infrastructure, a New Engine for Enterprise Intelligence," explaining Fabarta’s innovative technology and its practical value in real business scenarios.
The talk highlighted the complementary strengths of large language models (LLMs) and knowledge graphs, noting that while LLMs provide interactive interfaces and generative capabilities, they suffer from knowledge update latency, hallucinations, and security concerns; knowledge graphs offer stable, precise, and secure data management but lack intelligent processing.
Fabarta proposes a data‑centric architecture that deeply integrates LLMs, knowledge graphs, and databases, forming a new data‑intelligence paradigm called "One‑Body Two‑Wings".
The core component, Data Centric LLM , consists of three layers: data production (ArcFabric), data management (ArcNeural multimodal engine), and data consumption (ArcPilot). ArcFabric organizes and extracts relationships from enterprise data; ArcNeural provides a multimodal database with Graph HTAP, vector support, and cloud‑native separation of storage and compute; ArcPilot offers low‑code analytics and knowledge services.
Key advantages of ArcNeural include data privacy, natural‑language query capability, and result explainability through intermediate query plans executed by the database rather than directly by the LLM.
The multimodal engine’s storage architecture features a three‑tier design: an in‑memory graph layer for high‑performance queries, an SSD‑based log layer with Raft replication for durability, and an asynchronous remote storage layer supporting various deployment modes.
ArcNeural’s HTAP Graph technology enables hybrid transactional and analytical processing, allowing low‑latency TP queries and scalable AP analytics, with unified query language and serverless AP nodes that auto‑scale based on workload.
Vector support is integrated via HNSW indexing, SIMD‑accelerated distance calculations, attribute‑filtered vector search, and full CRUD/ACID capabilities within the graph model.
The system adopts a "One Query" philosophy, allowing a single query to retrieve both vector and graph data, demonstrated with a movie knowledge‑graph example that combines vector similarity and graph traversal in one plan.
Version 2.1 of ArcNeural adds multimodal data format support (graph, vector, JSON, table), enterprise‑grade ACID guarantees, cloud‑native elastic deployment, and full Rust‑based codebase for domestic compliance.
The "One‑Body Two‑Wings" product matrix includes ArcFabric (data production) and ArcPilot (data consumption), forming a closed loop where consumption feedback improves production.
Two demos showcased how graph‑plus‑vector techniques resolve LLM hallucinations and improve answer precision by cross‑validating results against the knowledge graph and multi‑modal data.
Overall, Fabarta’s architecture demonstrates how integrating LLMs with knowledge graphs and a multimodal database can create a robust, secure, and explainable enterprise intelligence platform.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.