SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation
By Md. Shamim Hussain et al
Read the original document by opening this link in a new tab.
Table of Contents
I. Introduction
II. Proposed Network Architecture
III. Experimental Methods
A. Description of the Corpora
B. Evaluation Strategy
C. Optimization Strategy
Summary
Speech, Music and Noise classification/segmentation is crucial for audio processing. SwishNet, a 1D Convolutional Neural Network, operates on MFCC features and achieved high accuracy in classification and segmentation tasks. The architecture is fast, lightweight, and memory efficient. SwishNet distills knowledge from a 2D CNN, pretrained on ImageNet, enhancing its performance without the need for specific feature engineering. Experimental evaluations on the MUSAN and GTZAN corpora demonstrate the effectiveness of SwishNet for audio classification and segmentation tasks.