DaTaobao Tech
Jul 12, 2023 · Artificial Intelligence
Optimizing ChatGLM-6B Deployment with MNN: Model Conversion, Quantization, and Edge Inference
The article details a workflow that converts the PyTorch ChatGLM‑6B model to MNN, splits and compresses embeddings, applies int4/int8 quantization, supports dynamic shapes, and uses hybrid GPU/CPU or CPU‑only loading to enable low‑memory edge inference on PCs and mobile devices with competitive token‑per‑second performance.
ChatGLMLLMMNN
0 likes · 16 min read