An approach that I've found out to work quite well is to set a number of papers to read. For example, google "speech to text" and read 10 papers. By the tenth paper, you'll probably have figured out who are the most prolific authors on the subject, then read their papers.
Start with L.R.Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition , Proc.IEEE 77(2):257-286.February 1989.
To understand this paper, you may have to read other papers or background material. However, this is the basis of all current, commercial speech recognition systems. The newer approach is using deep neural networks and the latest paper in this is:
Deep Speech: Scaling up end-to-end speech recognition
Awni Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, Andrew Y. Ng.