Tag

mobile AI

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Nov 20, 2024 · Mobile Development

MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment

MNN‑Transformer provides an end‑to‑end framework that enables large language and diffusion models to run efficiently on modern smartphones by exporting, quantizing (including dynamic int4/int8 and KV cache compression) and executing via a plugin‑engine runtime, achieving up to 35 tokens/s decoding and 2‑3× faster image generation compared with existing on‑device solutions.

LLMMNNdiffusion
0 likes · 15 min read
MNN-Transformer: Efficient On‑Device Large Language and Diffusion Model Deployment
Sohu Tech Products
Sohu Tech Products
Mar 6, 2024 · Mobile Development

On‑Device Deployment of Large Language Models Using Sohu’s Hybrid AI Engine and GPT‑2

The article outlines how Sohu’s Hybrid AI Engine enables on‑device deployment of a distilled GPT‑2 model by converting it to TensorFlow Lite, detailing the setup, customization with Keras, inference workflow, and core SDK calls, and argues that this approach offers fast, private, and cost‑effective AI for mobile devices despite typical LLM constraints.

GPT-2Hybrid AIKeras
0 likes · 9 min read
On‑Device Deployment of Large Language Models Using Sohu’s Hybrid AI Engine and GPT‑2
DataFunSummit
DataFunSummit
Sep 11, 2023 · Artificial Intelligence

Challenges and Insights for Deploying Large Models on Edge with MNN

The talk presents an overview of the MNN inference engine, outlines the end‑to‑end workflow for deploying large language models on mobile devices, discusses technical challenges and practical solutions, and concludes with future directions for edge AI deployment.

AIInference EngineLarge Models
0 likes · 2 min read
Challenges and Insights for Deploying Large Models on Edge with MNN
HelloTech
HelloTech
Aug 9, 2023 · Artificial Intelligence

Device Intelligence: Concepts, Architecture, and Applications

Device intelligence brings on-device reasoning and real-time inference to smartphones and IoT gateways, delivering low-latency, privacy-preserving, personalized services such as AR/VR enhancements and recommendation re-ranking, while confronting challenges of hardware fragmentation and model size, and complementing cloud AI through architectures like Hala’s MNN-based pipeline.

Device IntelligenceEdge ComputingModel Inference
0 likes · 10 min read
Device Intelligence: Concepts, Architecture, and Applications
HelloTech
HelloTech
May 8, 2023 · Artificial Intelligence

One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions

The article presents a comprehensive one‑stop AI platform that unifies training, model, feature, and decision services across cloud, edge, mobile, Flink, and application environments, detailing its architecture, the limitations of cloud‑centric inference, the advantages of localized inference, and the challenges and solutions for model and feature localization, SDK design, and future AutoML enhancements.

AI PlatformEdge ComputingModel Inference
0 likes · 17 min read
One‑Stop AI Platform for Cloud, Edge, Mobile, Flink, and Application Intelligence: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Feb 4, 2023 · Artificial Intelligence

Walle: An End‑to‑End, General‑Purpose, Scalable Edge‑Cloud Collaborative Machine Learning System

The article introduces Walle, Alibaba's four‑year‑old edge‑cloud collaborative machine‑learning platform that unifies compute containers, data pipelines, and a deployment platform to enable low‑latency, privacy‑preserving, and high‑throughput AI services across billions of mobile devices, and presents its architecture, design challenges, and evaluation results.

Cloud ComputingEdge ComputingSystem Architecture
0 likes · 25 min read
Walle: An End‑to‑End, General‑Purpose, Scalable Edge‑Cloud Collaborative Machine Learning System
DaTaobao Tech
DaTaobao Tech
Nov 18, 2022 · Artificial Intelligence

ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication

The article explains how ARMv86’s new SMMLA and BFMMLA GEMM instructions are integrated into MNN to accelerate INT8 and BF16 matrix multiplication, delivering up to 90% speedup over ARMv82’s SDOT and FP16‑FMLA kernels through optimized kernels, tiling, and compatibility handling.

ARMv86MNNNeural Network Inference
0 likes · 15 min read
ARMv86 Instruction Set Optimization for MNN: Accelerating Int8 and BF16 Matrix Multiplication
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Oct 28, 2022 · Artificial Intelligence

ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications

ShaderNN is an open‑source, sub‑2 MB GPU‑shader inference engine that runs TensorFlow, PyTorch and ONNX models directly on mobile graphics textures via OpenGL fragment and compute shaders, delivering real‑time, low‑power AI for image‑heavy tasks while eliminating third‑party dependencies and achieving up to 90 % speed gains.

GPUInference EngineShader
0 likes · 11 min read
ShaderNN: A GPU Shader‑Based Lightweight Inference Engine for Mobile AI Applications
ByteDance Terminal Technology
ByteDance Terminal Technology
Jul 29, 2022 · Artificial Intelligence

Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview

Pitaya, built by ByteDance’s Client AI and MLX teams, is a comprehensive end‑side AI engineering platform that provides a full workflow from model development and data preparation to deployment, monitoring, and federated learning, supporting large‑scale commercial scenarios across multiple apps.

AI PlatformFederated LearningInference Engine
0 likes · 14 min read
Pitaya: ByteDance’s End‑Side AI Engineering Platform Overview
DaTaobao Tech
DaTaobao Tech
Jul 13, 2022 · Artificial Intelligence

MNN 2.0: A Unified Edge‑Cloud Deep Learning Framework Overview

MNN 2.0 transforms Alibaba’s lightweight deep‑learning engine into a unified edge‑cloud framework, delivering ultra‑small binaries, broad model‑format support, and aggressive CPU/GPU/DSP/NPU optimizations—including SIMD, Winograd, quantization, and sparse computation—while providing Python‑style APIs for preprocessing, inference, and on‑device training.

Edge ComputingMNNdeep learning
0 likes · 18 min read
MNN 2.0: A Unified Edge‑Cloud Deep Learning Framework Overview
Kuaishou Large Model
Kuaishou Large Model
May 27, 2022 · Mobile Development

How Kuaishou Optimizes Mobile AI Effects with Dynamic Device Grading

To ensure consistent user experience across the wide range of Android and iOS devices, Kuaishou’s Y‑tech team designed a dynamic model‑grading framework that evaluates CPU, GPU, NPU, memory and other hardware metrics, then dispatches appropriately sized AI effect models and configurations in real time.

AndroidKuaishoudevice optimization
0 likes · 12 min read
How Kuaishou Optimizes Mobile AI Effects with Dynamic Device Grading
Kuaishou Tech
Kuaishou Tech
Apr 11, 2022 · Artificial Intelligence

Kuaishou's Custom Video Matting Solution: Interactive Object Segmentation for Mobile Creators

Kuaishou's audio‑video technology team presents a self‑developed custom video matting system that combines foreground, interactive, and video object segmentation to let creators extract arbitrary subjects without green screens, featuring adaptive cropping, multi‑stage training, and deployment across Android and iOS devices.

Computer VisionKuaishoudeep learning
0 likes · 15 min read
Kuaishou's Custom Video Matting Solution: Interactive Object Segmentation for Mobile Creators
Kuaishou Tech
Kuaishou Tech
Mar 3, 2022 · Artificial Intelligence

Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK

This article details the engineering optimizations applied to the image cropping stage of Kuaishou's YKit AI SDK, covering instruction-level fixes, SIMD acceleration, I/O cache improvements, algorithmic refinements, parallel processing, and device‑tier strategies to achieve up to 4.6× speedup on mobile devices.

AI SDKImage ProcessingNeon
0 likes · 12 min read
Optimization Techniques for Image Cropping in Kuaishou YKit AI SDK
Kuaishou Large Model
Kuaishou Large Model
Nov 26, 2021 · Artificial Intelligence

How Kuaishou’s ‘All‑Things AR’ Turns Real Objects into Interactive 3D Characters

‘All‑Things AR’ (万物AR) is a Kuaishou Y‑tech solution that lets users capture any real‑world object with a phone, automatically segments it using a custom AI model, and renders an animated 3D avatar via a lightweight SLAM‑based pipeline, enabling low‑cost, high‑quality AR experiences.

ARComputer VisionSLAM
0 likes · 16 min read
How Kuaishou’s ‘All‑Things AR’ Turns Real Objects into Interactive 3D Characters
Baidu App Technology
Baidu App Technology
Nov 25, 2021 · Game Development

Building an AI-Powered Object Hunt Game with Paddle.js and PaddleClas

The article details how to create the AI‑driven “Object Hunt Battle” game by processing data, designing and training a PP‑LCNet model with PaddleClas, converting it for Paddle.js, and integrating real‑time WebGL inference on mobile devices, achieving sub‑50 ms latency and encouraging developers to explore further.

AI game developmentPaddle.jsPaddleClas
0 likes · 9 min read
Building an AI-Powered Object Hunt Game with Paddle.js and PaddleClas
Kuaishou Tech
Kuaishou Tech
Jul 23, 2021 · Artificial Intelligence

Real-time Single-image 3D Photo Generation on Mobile Devices Using Deep Learning

The article presents a mobile‑first solution that converts a single RGB photograph into an interactive 3D photo by combining learning‑based monocular depth estimation, multi‑task image‑and‑depth restoration, face‑specific refinement, and a custom KwaiNN inference engine to achieve real‑time rendering on all smartphone models without requiring depth sensors.

3D PhotoARKwaiNN
0 likes · 16 min read
Real-time Single-image 3D Photo Generation on Mobile Devices Using Deep Learning
Kuaishou Large Model
Kuaishou Large Model
Jul 23, 2021 · Artificial Intelligence

How Kuaishou’s Y‑Tech Achieved Real‑Time 3D Photo Rendering on Any Smartphone

The article details Kuaishou Y‑Tech’s end‑to‑end solution for converting a single RGB image into an interactive 3D photo on mobile devices, covering depth estimation, image‑inpainting, custom KwaiNN inference, and real‑time 3D rendering techniques that run on all smartphone models without depth sensors.

3D PhotoImage InpaintingKwaiNN
0 likes · 17 min read
How Kuaishou’s Y‑Tech Achieved Real‑Time 3D Photo Rendering on Any Smartphone
DataFunTalk
DataFunTalk
Jun 3, 2021 · Artificial Intelligence

Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods

This article examines the internal structure of BERT and systematically presents various model‑compression strategies—including quantization, pruning, knowledge distillation, and structure‑preserving techniques—highlighting their impact on storage, computational cost, and inference speed for deployment on resource‑constrained mobile devices.

BERTknowledge distillationmobile AI
0 likes · 16 min read
Compression Techniques for BERT: Analysis, Quantization, Pruning, Distillation, and Structure-Preserving Methods
Sohu Tech Products
Sohu Tech Products
Feb 24, 2021 · Artificial Intelligence

EdgeRec: Edge Computing in Recommendation Systems

EdgeRec explores how moving recommendation system components to the edge—leveraging real‑time user behavior, heterogeneous action modeling, on‑device reranking, mixed‑ranking, and personalized “thousand‑person‑one‑model” training—can reduce latency, improve relevance, and boost business metrics compared to traditional cloud‑centric pipelines.

Edge ComputingRecommendation systemsmeta-learning
0 likes · 19 min read
EdgeRec: Edge Computing in Recommendation Systems
DataFunTalk
DataFunTalk
Feb 11, 2021 · Artificial Intelligence

How to Build Successful AI Products: Insights on AI Development, NLP, and Product Strategies

This article explores the current state of AI, the evolution of NLP and voice assistants, common pitfalls in AI product development, and practical product‑management methods—including user segmentation, metric design, and lifecycle planning—to help engineers and product managers deliver effective AI‑driven solutions.

AINLPVoice Assistant
0 likes · 19 min read
How to Build Successful AI Products: Insights on AI Development, NLP, and Product Strategies