Tag

Sdot

1 views collected around this technical thread.

DaTaobao Tech
DaTaobao Tech
Nov 24, 2023 · Artificial Intelligence

Performance Optimization of Depthwise Conv Int8 on ARM CPUs

By converting the input format to a C16 layout and exploiting the ARM V8.2 Sdot instruction, the Int8 depthwise‑convolution operator on ARM CPUs can be accelerated from 4.46 ms to 1.75 ms—a 2.5× speedup—though the required data‑rearrangement overhead prevents it from overtaking FP16 performance.

ARMDepthwiseConvolutionInt8
0 likes · 10 min read
Performance Optimization of Depthwise Conv Int8 on ARM CPUs