Please suggest good research papers on speech recognition , speech to text, voice recognition?

Yash Vairagade ,

speech recognition , speech to text, voice recognition require different techniques and preprocessing. Could you specify which neural network type you are using and the particular features you are trying to process? This will help in providing more accurate answers for bibliography.

Regards

Suraj Kumar Mallick

Dear Yash Vairagade,

Please follow the link given below. Hopefully your requirements will fulfill.

https://www.researchgate.net/publication/308812610_Convolutional_Neural_Networks-based_continuous_speech_recognition_using_raw_speech_signal

Best wishes with regards

Suraj

Arkadiy Prodeus

1) machine learning course: https://www.coursera.org/learn/machine-learning

more practical machine learning course https://www.fast.ai/

2) book on machine learning on Python http://faculty.neu.edu.cn/yury/AAI/Textbook/Deep%20Learning%20with%20Python.pdf

3) modern book on speech technology https://web.stanford.edu/~jurafsky/slp3/edbook_oct162019.pdf

Abdulaziz Saleh Ba Wazir

a recent papers are listed below including a good review of SR and recently developed technique (specaugment) that will surely help you while developing your SR model

https://ieeexplore.ieee.org/abstract/document/8632885

Preprint SpecAugment: A Simple Data Augmentation Method for Automatic...

Yash Vairagade

i am planning to use one of these two datasets:-

fairly new to deep learning , most probably rnn will be used .

one:-

https://www.kaggle.com/paultimothymooney/medical-speech-transcription-and-intent

Two:-

https://tspace.library.utoronto.ca/handle/1807/24487 https://www.kaggle.com/ejlok1/toronto-emotional-speech-set-tess/

any suggestions or research paper aligning with dataset is highly helpful.

Thanks a ton to all who commented, i am finding this forum very helpful.

Arturo Geigel

Yash Vairagade ,

If you are looking into doing research into alignment please take a look at [1]. The paper assumes that you are fluent in speech recognition. If you are not familiarised with speech recognition I highly encourage that you get the book [2], and then you can read the draft version for the 3rd edition which the authors have online. To get a background on speech recognition using deep NNs please refer to [3] and the work by Deng and his group (he is one of the authors in [3]). Do forward citation search on [3] to get more up to date references.

Also, note that the literature is divided on which type of deep NN it is. You must choose on whether to use convolutional NNs or restricted Boltzmann machines.

I took a look at the two data and it does not appear to have transcriptions and that can mean additional work for alignment. Do verify this

If alignment is not what your work is going to be and just an intermediate step, let me know so that I can provide more focused references.

Regards

[1] https://era.library.ualberta.ca/items/0fbd7532-1105-4641-9e12-ebb0e349e460/view/169d5d27-d3a6-411e-ad6d-320032130103/kelley_tucker_interspeech2018.pdf

[2] Jurafsky, D., & Martin, J. H.(2008). Speech and Language Processing 2nd ed.

[3]https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/HintonDengYuEtAl-SPM2012.pdf

Usman Ahmed

I suggest to used the section of https://paperswithcode.com/

It contains the relevant section with details.

Yash Vairagade

Arturo Geigel

The above mentioned two datasets got approved along with this third one:-

https://www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech

Dataset 1:-

https://www.kaggle.com/paultimothymooney/medical-speech-transcription-and-intent

and Dataset 3:-

https://www.kaggle.com/mfekadu/darpa-timit-acousticphonetic-continuous-speech

1 and 3 have both audio files and their transcriptions. so i guess for these two i dont have to do extra work.

out of these 4 , i have to select 1 and proceed with research. which among these you think would be best for my research . (note that i have time till start of april 2021 to submit my thesis)

also thanks for the materials that you have shared. i will start going through them.

i am new to deep learning, and as you said -" literature is divided on which type of deep NN it is. You must choose on whether to use convolutional NNs or restricted Boltzmann machines "

i am clueless as of now, and would ask for any suggestions from you, given the two datasets listed above, Could you help me narrow down my research area like

go with this xyz technique with this xyz paper has similar mention.

if thats possible then may be i will master only lets say convolutional NN or only LSTM based on area of my research.

Again thanks for your replies.

i have an excel listing papers aligning with this datasets. sharing this with you.

Arturo Geigel

Yash Vairagade ,

I would go very conservative and use the Timit dataset for the following reasons:

It has been beaten to death so you can find lots of references and benchmarks on this dataset

the transcription is very good in terms of boundary demarcations

Now to answer the other question on the type of NN it will depend on your literature review. For example, assuming that you are still doing speech alignment only my suggestion is that once you finish your literature review you find a baseline paper on which to compare your results. One such paper (you need to do your verification to see if this is one of the latest) is [1]. The procedure would be to reproduce the results of that paper to verify that you have the appropriate settings and the tool is "calibrated". Once you do this you use this tool as baseline against your proposed NN architecture. To summarise the literature review will dictate the leeway in choosing the architecture. We can discuss it once you have finished your literature review.

Note also that you should consult with your advisor on the feedback posted on this forum.

Regards

[1]https://www.research.ed.ac.uk/portal/files/20135664/master.pdf

What is the most Operating System (OS) which can be INSTALL on Tiny hard devices like Raspberry Pi (RPi) hardware ?

Is there a powerful system for the security of the systems distributed on IoT systems?

How to choose ANN number of hidden layers and nodes?

Unknown debris sheets observed in cell culture flasks?

How to sample data to increase presence of solar panels in the images?

Satellite orbit estimation using Recurrent Neural Network - what should I use?

Any methods to train a neural network with random (stochastic) and non-measurable variable?

Which DL architecture is better to work on handwritten data?

Does anyone know what are the pros and cons for using autoencoders instead of CNNs for features extraction in neural networks?

What are the popular techniques for pre-processing measurement data in Smart Grid/Power System?