Having participated in a few projects aiming to build simple ASR solutions using CMU Sphinx and open-source models from VoxForge (adapted by myself), I am quite disappointed in their performance. While exposing an acceptable performance in the lab, open source solutions often fail when facing a real-life environment. Even the simplest task like recognition according to a small grammar becomes a problem.

Sure, ASR is a very challenging task that requires huge amounts of dedicated acoustic data, powerful noise cancellation and voice activity detection, as well as careful design of domain-specific language models. But even if those conditions are met, the result is far behind the expectations even for small-vocabulary tasks.

A friend of mine, who was trying to build an IVR system based on Asterisk + UniMRCP + Sphinx, told that the main lesson he has learned was 'not to use the open source technology'. The website of SpokenTech Inc. (http://spokentech.com/) looks abandoned, their demos from speechapi.com do not work.

Can anyone share their experience in development of real-life products upon Sphinx or any other open source engine? In which cases is it worth it to invest your time in that?

More Dmytro Prylipko's questions See All
Similar questions and discussions