Alluxio in Data & AI Lakehouse: Architecture, Performance Optimizations, and Cloud Practices at OPPO
OPPO's data architects combined their self‑developed Shuttle service with Alluxio to double performance, halve system pressure, and double throughput, while building a unified Data & AI lakehouse that integrates structured and unstructured data, metadata management, real‑time ingestion, and cloud cost reductions.
OPPO data architects integrated their self‑developed Shuttle service with Alluxio, achieving roughly double performance, halving system pressure and doubling throughput.
The overall architecture separates structured Data and unstructured AI, using a Data and Catalog system built on Alluxio for metadata management and a unified lake service.
DAA‑Catalog combines a Metastore based on Iceberg with a Management layer (Down Service) to provide real‑time inserts, queries, and automatic sinking of data to Iceberg, eliminating frequent small‑file commits.
For structured data, a “Dynamic Cluster” mechanism accelerates queries through caching and optimized hot‑table/index handling.
Unstructured data is ingested via Transfer service, converted to update‑set format, and its metadata is stored in the catalog and vector databases to enable natural‑language and model‑driven queries.
Alluxio also enables sub‑second lake ingestion through Real‑data, Base‑data, Dump Service, and streaming file I/O, and improves Spark broadcast handling by storing large broadcast data in Alluxio, supporting up to 10 GB.
In public and hybrid cloud deployments (AWS, Alibaba Cloud), the Alluxio + Shuttle stack reduces compute costs by ~80 % and provides multi‑tenant caching for both batch and streaming workloads.
Future plans include deeper integration of Flink and Spark with Alluxio, further memory resource exploitation, and continued AI‑driven data services.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.