DaTaobao Tech
Oct 16, 2024 · Artificial Intelligence
Dynamic Quantization and Matrix Multiplication Optimization in MNN CPU Backend
The article details MNN’s CPU backend dynamic quantization for Transformer‑type models, describing runtime int8 conversion, block‑wise matrix‑multiply optimizations using ARM SMMLA/SDOT and AVX‑512 VNNI, weight‑group and batch‑wise quantization techniques, and reports up to three‑fold speed‑ups on Snapdragon 8 Gen 3.
CPU optimizationDynamic QuantizationInt8
0 likes · 19 min read