OPPO Next‑Generation Big Data & AI Integrated Architecture on Functional Cloud
This article presents OPPO’s next‑generation big‑data and AI integrated architecture on functional cloud, detailing a cloud‑native elastic compute framework, a unified data‑lake solution, real‑time feature platforms, machine‑learning data acceleration, and hybrid‑cloud deployments, highlighting performance gains and cost reductions.
Introduction Cloud providers attract users with storage services and SaaS layers. OPPO demonstrates a next‑generation big data‑AI integrated architecture on functional cloud, aiming to improve scheduling efficiency, achieve automatic elastic scaling, and optimize resource utilization.
Technical Architecture OPPO operates a mixed‑cloud environment (AWS, self‑built data centers, and hybrid cloud in India) with over ten thousand machines. The legacy EMR solution suffered from slow elastic scaling, fixed x86 hardware, and rigid scheduling algorithms. To address these issues, OPPO built a custom elastic compute architecture – Yarn on EKS – leveraging Kubernetes for large‑scale, fast scheduling.
The architecture incorporates an elastic RSS solution based on Alluxio, using HDFS/Cubefs as storage back‑ends. Resource utilization consistently exceeds 80 % physical usage, and cost dashboards show hourly billing with detailed per‑instance charges.
Data & AI Integrated Data‑Lake The unified data‑lake stack solves three problems: second‑level data ingestion, lack of a service layer for Iceberg, and handling of unstructured data via a DAA Catalog. Distributed memory (Alluxio) provides real‑time data ingestion, with a dump service periodically persisting blocks to Iceberg. The DAA Catalog consists of a Metastore (similar to HMS) and a management module handling metadata, data lineage, security, versioning, and conversion services.
For unstructured data, metadata is embedded using AndesGPT (or ChatGPT) to enable natural‑language queries. Large‑language‑model‑driven prompts automate SQL generation for data analysis, and the system feeds back business meanings to improve prompts.
Application Landing
Real‑time Feature Platform: Implements primary‑key real‑time joins achieving 7k QPS per machine with linear scaling across nodes.
Machine‑Learning Training Data Acceleration: Converts raw text to Parquet and uses Apache Arrow, yielding a 10× speedup; converts image datasets into small tar‑based datasets for multi‑fold performance gains.
Hybrid‑Cloud Scenarios: Uses DAA‑Catalog for data replication across clouds, optimizes bandwidth‑constrained migrations, and maintains data consistency.
Outlook Future work will deepen public‑cloud elastic architectures (e.g., Spark on GPU, Gluten+Velox vectorization) to further cut costs and boost performance, while enhancing metadata management and semi‑structured data support, and integrating large‑model applications for smarter data analysis.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.