I have read different research papers on VoIP and algorithms related to VoIP (mostly digital audio signal processing) such as VAD, Sound mixing, etc. I am curious that most of these algorithms take 10ms of data into account. While recording PCM-16bit data from a smartphone (Android OS), we get 40ms of minimum data. So, what is the significance of using 10ms of data?