Artificial Intelligence 10 min read

Machine Heart Column: Fast Hands MMU's New Dialect Identification Method

Fast Hands MMU and Tsinghua University researchers introduced a novel dynamic multi-scale convolution network for dialect identification, achieving significant performance improvements over state-of-the-art systems.

Kuaishou Tech

Aug 31, 2021

Machine Heart Column: Fast Hands MMU's New Dialect Identification Method

Fast Hands MMU (Multimedia understanding) and Tsinghua University researchers proposed a novel dynamic multi-scale convolution network for dialect identification. The method introduces dynamic convolution kernels, local multi-scale learning, and global multi-scale pooling to capture global and local context information. The approach achieved an average cost loss (Cavg) of 0.067 and an equal error rate (EER) of 6.52% on the AP20-OLR dialect identification task, outperforming the state-of-the-art system by 9% in Cavg and 45% in EER while reducing model parameters by 91%. The paper was accepted by Interspeech 2021.

The method's components include dynamic convolution kernels for adaptive feature capture, local multi-scale learning for fine-grained representation, and global multi-scale pooling for feature aggregation. Experimental results demonstrated the effectiveness of the approach across multiple metrics, with the dynamic multi-scale convolution method showing superior performance in all evaluated indicators.

Code implementation details are provided in the original paper, with dynamic convolution kernels implemented using Softmax attention mechanisms and local multi-scale learning using feature concatenation and convolution operations. The global multi-scale pooling layer aggregates features from different bottleneck layers using standard deviation vectors.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

neural networks audio signal processing performance metrics Dialect Identification dynamic convolution

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.