Voice Robot Sound Classification: Feature Extraction, VGGish Model, and Optimization Experiments
This article describes the end‑to‑end pipeline of a voice robot, covering speech framing, feature extraction (FBank, MFCC), the VGGish embedding network, various model architectures, experimental results on accuracy and recall, and future directions for improving sound‑type classification.