Having participated in a few projects aiming to build simple ASR solutions using CMU Sphinx and open-source models from VoxForge (adapted by myself), I am quite disappointed in their performance. While exposing an acceptable performance in the lab, open source solutions often fail when facing a real-life environment. Even the simplest task like recognition according to a small grammar becomes a problem.
Sure, ASR is a very challenging task that requires huge amounts of dedicated acoustic data, powerful noise cancellation and voice activity detection, as well as careful design of domain-specific language models. But even if those conditions are met, the result is far behind the expectations even for small-vocabulary tasks.
A friend of mine, who was trying to build an IVR system based on Asterisk + UniMRCP + Sphinx, told that the main lesson he has learned was 'not to use the open source technology'. The website of SpokenTech Inc. (http://spokentech.com/) looks abandoned, their demos from speechapi.com do not work.
Can anyone share their experience in development of real-life products upon Sphinx or any other open source engine? In which cases is it worth it to invest your time in that?