Artificial Intelligence 15 min read

Pluto: OPPO’s AutoML Tool for Hardware‑Aware Model Compression and Deployment

This article introduces OPPO’s self‑developed AutoML platform Pluto, explains why automated machine learning and model compression are essential for industrial AI, describes Pluto’s hardware‑aware and uniform algorithm framework, showcases typical applications such as video super‑resolution, and provides a detailed Q&A on its methodology and performance.

DataFunTalk
DataFunTalk
DataFunTalk
Pluto: OPPO’s AutoML Tool for Hardware‑Aware Model Compression and Deployment

01 Why AutoML?

AI algorithms are increasingly used in industry, but the growing number of interaction modules with ML code (data collection, feature engineering, resource management, etc.) makes the overall pipeline complex and labor‑intensive. Deploying AI models on chips also requires repeated experiments to meet hardware constraints, creating a strong demand for automated machine learning (AutoML) to streamline the entire workflow.

1. AI Algorithms in Industry

Applications such as intelligent systems, recommendation engines, medical diagnosis, smart retail, automated logistics, safety monitoring, and wildlife recognition are expanding rapidly.

Although the amount of ML code is relatively small, the surrounding modules dominate the pipeline, increasing complexity and workload.

2. Challenges of AI Model Deployment

Ideally, an algorithm engineer should be able to build, train, compress, adapt to chips, perform inference, evaluate cloud‑edge performance, and deploy the model with a single run. In practice, after compression, many metrics (latency, energy, memory) often fail to meet requirements, requiring extensive iterative experiments.

3. Why AutoML?

AutoML aims to automate the full ML process, especially deployment, with minimal human intervention while preserving model performance. Its key features include:

Friendly development experience

Easy iteration, debugging, and packaging

Exploiting hardware’s ultimate performance

Rapid deployment in typical scenarios

By avoiding repeated experiments and lowering the expertise barrier, AutoML can save significant human effort, time, and resources.

02 Why Model Compression?

Resource constraints on end devices (e.g., smartphones) limit the deployment of AI models. Compression is needed to meet latency, energy, and memory budgets.

Many applications (e.g., beauty cameras) require on‑device processing, but mobile hardware is far less powerful than cloud servers.

Mid‑range and low‑end devices dominate the market, so models must be optimized for limited resources.

Compressing models balances performance (e.g., accuracy) with constraints such as latency, FLOPs, energy consumption, and storage.

03 Pluto: AutoML Tool

1. Hardware‑Aware Feature

Pluto can retrieve hardware metrics (latency, energy, memory) during compression and incorporate them into training, eliminating the need for repeated manual tuning.

FLOPs are often used as a proxy for energy and latency, but the relationship is not linear; Pluto’s hardware‑aware approach provides more accurate optimization.

2. Uniformed Algorithm Framework

Unlike serial optimization, Pluto integrates NAS, pruning, quantization, and knowledge distillation into a single unified framework, enabling end‑to‑end joint optimization.

3. Joint Optimization in a Unified Framework

By embedding constraints (energy, latency, storage, FLOPs) into the loss via Lagrangian methods, Pluto simultaneously maximizes model accuracy while satisfying hardware limits.

Pluto supports CNNs, RNNs, and models with skip connections, automatically aligning structures after pruning.

04 Pluto’s Typical Applications at OPPO

Video Super‑Resolution: Compared with the baseline, Pluto reduces human effort by 90%, cuts FLOPs by 93%, and only drops PSNR by 3.4%.

Algorithm E² NAS: Our NAS‑based pose‑estimation models outperform SOTA on COCO across various devices, achieving higher AP at lower latency.

Other CV Tasks: In AINR (denoising) and AISR (super‑resolution) projects, latency improves by 15.6% and 35% respectively, with model size reductions of 50% and 12%.

05 Q&A

Q1: What is the principle of model compression?

It solves a constrained optimization problem that maximizes performance while satisfying resource constraints (latency, FLOPs, power, etc.) using Lagrangian relaxation.

Q2: How resource‑intensive is AutoML training?

Training time varies by task; for example, a COCO256 model takes ~3.5 days, and NAS search time is comparable. Smaller datasets (e.g., MPII) can finish in about a day.

Q3: How does Pluto handle skip‑connection alignment?

Instead of forcing alignment with extra layers, Pluto traces dimension changes upstream until a layer can be left unchanged, ensuring structural consistency.

Q4: Is there work on multimodal pre‑training with AutoML?

Yes; combining large‑scale pre‑trained models with AutoML can help discover architectures that are more data‑aware.

Q5: How much improvement does Pluto bring compared to previous tools?

In video super‑resolution, Pluto saves 90% of manual effort.

Q6: Any recommended papers?

Relevant literature includes classic NAS and DARTS papers; our latest solution is under submission.

Q7: Is the performance measured on CPU or GPU?

Model training and search run on cloud GPUs; inference performance is evaluated on mobile devices.

Q8: Which technique (distillation, pruning, quantization) shows the most impact in AutoML?

Pluto treats all techniques uniformly; the best choice depends on the model and hardware.

Q9: Do denoising and super‑resolution affect frame‑rate?

Both applications improve objective metrics (PSNR, SSIM) without changing the frame‑rate.

Q10: How does teacher‑student mismatch affect distillation?

Some structural difference is beneficial; excessive similarity limits gains, while large gaps can reduce effectiveness.

Thank you for attending the session.

model compressionAutoMLNeural Architecture SearchPlutoOPPOHardware‑Aware
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.