Artificial Intelligence 18 min read

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.

DataFunTalk

May 20, 2024

Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations

Introduction The article shares OPPO's multi‑modal pre‑training model deployment in edge‑cloud scenarios, focusing on achieving low‑cost training and inference on mobile devices despite limited resources.

Three Main Topics

1. Edge Image‑Text Retrieval Research – Discusses the shift from tag‑based photo search to CLIP‑based natural language search, challenges in on‑device performance, speed, and security, and presents a compression‑friendly CLIP architecture with ALBEF‑style single‑stream fusion, projector separation, and contrastive loss distillation to obtain a lightweight student model that matches cloud accuracy while running offline.

2. Text‑Image Generation & Understanding Model Optimization – Describes continued pre‑training of Chinese large models using English foundations, adapter‑based multilingual alignment, LoRA‑style domain fine‑tuning with anti‑forgetting (LWF), and efficient negative‑sample construction for fine‑grained attribute learning, achieving strong results with minimal data and training time.

3. Edge Lightweight of Text‑Image Generation Models – Explains diffusion model compression (Unet layer pruning, trade‑off models, distillation to Chinese small models), sampling acceleration via progressive distillation and classifier‑free guidance (CFG) distillation, and effect comparison showing inference time reduced from >10 s to ~2.5 s without hardware‑specific operator optimizations.

Real‑World Applications – Provides examples such as AI‑generated wallpapers, lock‑screen ads, personalized product advertising, portrait vertical domain optimization, and pipelines for generating travel journals, demonstrating the practicality of on‑device AIGC.

Future Directions – Highlights remaining challenges in memory management, module scheduling, and full end‑to‑end on‑device generation pipelines, and outlines plans for combined algorithmic and operator optimizations to achieve imperceptible latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Model Compression Edge AI multimodal AIGC diffusion OPPO

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.