Neural Networks in conjunction with Hidden Markov Models have been used for a long time in speech recognition. Deep Neural Networks with their many hidden layers and multiple levels of non-linearities are able to infer higher abstraction level concepts compared to their single hidden layer counterparts. The fact that speech is an inherently dynamic process merits the use of a recurrent architecture (RNN) for temporal modeling.

Papers:

Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton,

“Speech recognition with deep recurrent neural networks,” in Acoustics, speech and signal processing (icassp), 2013 ieee international conference on. IEEE, 2013, pp. 6645–6649.

Yajie Miao, Mohammad Gowayyed, and Florian Metze, “Eesen:

End-to-end speech recognition using deep rnn models and wfstbased decoding,” in Automatic Speech Recognition and Understanding (ASRU), 2015 IEEE Workshop on. IEEE, 2015, pp. 167–174.

Mirco Ravanelli, Philemon Brakel, Maurizio Omologo, and Yoshua

Bengio, “Batch-normalized joint training for dnn-based distant

speech recognition,” in Spoken Language Technology Workshop

(SLT), 2016 IEEE. IEEE, 2016, pp. 28–34.

More Eduard Babulak's questions See All
Similar questions and discussions