Deep Neural Network Based Speech Recognition Systems under Noise Perturbations

Yifang Liu
Smule Inc


Automatic speech recognition, which plays an important role in human computer interactions, is the cornerstone of communications between human and smart devices. In past few years, deep neural networks (DNNs) have been deployed in the automatic speech recognition with great success. However, recent research has discovered that DNNs are not robust against small perturbations. In this work, we investigate the noise immunity capability under various neural network models in speech recognition task. Our experimental results demonstrate that the phonemic error rate (PER) degrades as the signal-to-noise ratio (SNR) reduces across all evaluated neural network models, when the noise is cast onto the original speech audio. On the other hand, the multilayer perceptron (MLP) network model outperforms all other recurrent neural network (RNN) models, when the noise is cast onto Mel-frequency cepstral coe_cient (MFCC) features.