I am pursuing my masters in data science. I need a good research papers on speech recognition that i can refer for my research to create neural networks - recurrent neural network for speech to text etc.
speech recognition , speech to text, voice recognition require different techniques and preprocessing. Could you specify which neural network type you are using and the particular features you are trying to process? This will help in providing more accurate answers for bibliography.
a recent papers are listed below including a good review of SR and recently developed technique (specaugment) that will surely help you while developing your SR model
If you are looking into doing research into alignment please take a look at [1]. The paper assumes that you are fluent in speech recognition. If you are not familiarised with speech recognition I highly encourage that you get the book [2], and then you can read the draft version for the 3rd edition which the authors have online. To get a background on speech recognition using deep NNs please refer to [3] and the work by Deng and his group (he is one of the authors in [3]). Do forward citation search on [3] to get more up to date references.
Also, note that the literature is divided on which type of deep NN it is. You must choose on whether to use convolutional NNs or restricted Boltzmann machines.
I took a look at the two data and it does not appear to have transcriptions and that can mean additional work for alignment. Do verify this
If alignment is not what your work is going to be and just an intermediate step, let me know so that I can provide more focused references.
1 and 3 have both audio files and their transcriptions. so i guess for these two i dont have to do extra work.
out of these 4 , i have to select 1 and proceed with research. which among these you think would be best for my research . (note that i have time till start of april 2021 to submit my thesis)
also thanks for the materials that you have shared. i will start going through them.
i am new to deep learning, and as you said -" literature is divided on which type of deep NN it is. You must choose on whether to use convolutional NNs or restricted Boltzmann machines "
i am clueless as of now, and would ask for any suggestions from you, given the two datasets listed above, Could you help me narrow down my research area like
go with this xyz technique with this xyz paper has similar mention.
if thats possible then may be i will master only lets say convolutional NN or only LSTM based on area of my research.
Again thanks for your replies.
i have an excel listing papers aligning with this datasets. sharing this with you.
I would go very conservative and use the Timit dataset for the following reasons:
It has been beaten to death so you can find lots of references and benchmarks on this dataset
the transcription is very good in terms of boundary demarcations
Now to answer the other question on the type of NN it will depend on your literature review. For example, assuming that you are still doing speech alignment only my suggestion is that once you finish your literature review you find a baseline paper on which to compare your results. One such paper (you need to do your verification to see if this is one of the latest) is [1]. The procedure would be to reproduce the results of that paper to verify that you have the appropriate settings and the tool is "calibrated". Once you do this you use this tool as baseline against your proposed NN architecture. To summarise the literature review will dictate the leeway in choosing the architecture. We can discuss it once you have finished your literature review.
Note also that you should consult with your advisor on the feedback posted on this forum.