Machine Heart Column: Fast Hands MMU's New Dialect Identification Method
Fast Hands MMU and Tsinghua University researchers introduced a novel dynamic multi-scale convolution network for dialect identification, achieving significant performance improvements over state-of-the-art systems.
Fast Hands MMU (Multimedia understanding) and Tsinghua University researchers proposed a novel dynamic multi-scale convolution network for dialect identification. The method introduces dynamic convolution kernels, local multi-scale learning, and global multi-scale pooling to capture global and local context information. The approach achieved an average cost loss (Cavg) of 0.067 and an equal error rate (EER) of 6.52% on the AP20-OLR dialect identification task, outperforming the state-of-the-art system by 9% in Cavg and 45% in EER while reducing model parameters by 91%. The paper was accepted by Interspeech 2021.
The method's components include dynamic convolution kernels for adaptive feature capture, local multi-scale learning for fine-grained representation, and global multi-scale pooling for feature aggregation. Experimental results demonstrated the effectiveness of the approach across multiple metrics, with the dynamic multi-scale convolution method showing superior performance in all evaluated indicators.
Code implementation details are provided in the original paper, with dynamic convolution kernels implemented using Softmax attention mechanisms and local multi-scale learning using feature concatenation and convolution operations. The global multi-scale pooling layer aggregates features from different bottleneck layers using standard deviation vectors.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.