Deploying OPPO Multi‑Modal Pretrained Models in Edge‑Cloud Scenarios: Techniques and Optimizations
This article presents OPPO's practical research on deploying multi‑modal pre‑training models across mobile devices and cloud, covering edge image‑text retrieval, text‑image generation and understanding optimizations, and lightweight diffusion model techniques, with detailed algorithmic improvements, performance results, and real‑world application cases.
Introduction The article shares OPPO's multi‑modal pre‑training model deployment in edge‑cloud scenarios, focusing on achieving low‑cost training and inference on mobile devices despite limited resources.
Three Main Topics
1. Edge Image‑Text Retrieval Research – Discusses the shift from tag‑based photo search to CLIP‑based natural language search, challenges in on‑device performance, speed, and security, and presents a compression‑friendly CLIP architecture with ALBEF‑style single‑stream fusion, projector separation, and contrastive loss distillation to obtain a lightweight student model that matches cloud accuracy while running offline.
2. Text‑Image Generation & Understanding Model Optimization – Describes continued pre‑training of Chinese large models using English foundations, adapter‑based multilingual alignment, LoRA‑style domain fine‑tuning with anti‑forgetting (LWF), and efficient negative‑sample construction for fine‑grained attribute learning, achieving strong results with minimal data and training time.
3. Edge Lightweight of Text‑Image Generation Models – Explains diffusion model compression (Unet layer pruning, trade‑off models, distillation to Chinese small models), sampling acceleration via progressive distillation and classifier‑free guidance (CFG) distillation, and effect comparison showing inference time reduced from >10 s to ~2.5 s without hardware‑specific operator optimizations.
Real‑World Applications – Provides examples such as AI‑generated wallpapers, lock‑screen ads, personalized product advertising, portrait vertical domain optimization, and pipelines for generating travel journals, demonstrating the practicality of on‑device AIGC.
Future Directions – Highlights remaining challenges in memory management, module scheduling, and full end‑to‑end on‑device generation pipelines, and outlines plans for combined algorithmic and operator optimizations to achieve imperceptible latency.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.