What is your view on the open source speech technology?

More Dmytro Prylipko's questions See All

How to effectively help promote advanced automated construction and reconstruction management technology on the technology market?

Automated Technology "Building Manager" State of the Art Introduction AT "Building Manager" represents a groundbreaking advancement in construction project management, leveraging state-of-the-art...

30 June 2024 8,053 2 View

How can .NET Identity be used to implement authentication and authorization in the system, taking into account individual user schedules?

I'm interested in understanding the practical implementation of authentication and authorization using .NET Identity, specifically in the context of accommodating individual user schedules. If...

17 December 2023 1,421 3 View

How to prepare a substrate (chicken feathers, wool) for measuring keratinase activity?

We are engaged in the determination of keratinase activity and we have problems when preparing the substrate (it burns at 100 degrees in DMSO). What methods of substrate preparation can you recommend?

14 December 2023 8,205 0 View

Which framework is better in 2023?

Hi everyone, Last few days I was wondering about which JavaScript frontend framework is better to learn and use in 2023? What are the advantages and prospects of each of them? Would like to see...

09 December 2023 3,981 3 View

Do you think Rome was a polis or a civitas?

Is it appropriate to use the Greek concept of polis if Rome was not part of the Hellenic world?

10 November 2023 4,274 1 View

In ephys recording chamber, that uses a pump to flow in and remove ACSF, how would you call the outflow tube/needle used to avoid pulsation of fluid?

It looks like a syringe needle with a flat diagonal opening, or a blunt tube with slits cut on the sides, and it's used to suck out the ACSF and keep the fluid level stable in the recording...

22 October 2023 9,448 3 View

What this strange formation on Lactobacillus delbrueckii subsp. bulgaricus?

Maybe someone knows what these strange formations on lactic acid bacteria (lactobacillus delbrueckii subsp. bulgaricus) are? bacteria were grown on liquid MRS medium for 16 hours, no infection...

18 October 2023 2,068 0 View

Why do the colonies of the fungus turn yellow when Trichoderma strains are grown on a solid agar (wort agar) medium?

does it depend on the components of the environment or the influence of physical factors (light)?

04 October 2023 9,140 0 View

Best journal for publication of industrial trials results in the field chemical engineering?

Hello colleagues! Recommend a journal (only Scopus!) that accepts articles about the results of industrial and pilot tests of various technologies for the production of materials of nuclear-grade...

05 August 2023 8,560 0 View

When recording fEPSP input/output curve in hippocampal slices, sometimes further increasing stimulation amplitude reduces output EPSP. Why?

I understand increasing stimulation causing stronger EPSP, but sometimes, after certain point, further increasing stimulation only decreases EPSP. What is the mechanism of this? For example...

25 May 2023 7,819 4 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Can anyone provide me with molecular docking softwares/ websites?

Molecular docking software/ websites?

02 August 2024 8,704 7 View

Why is nonpoint source pollution potentially more harmful and difference between point and nonpoint sources of water pollution?

01 August 2024 1,180 2 View

Why do open and free science in a world where science is not open and free?

Because I have realized that the world tends more and more to do open and free science and there is a trend more and more to choose free databases, free tools and open access platforms.

01 August 2024 10,046 1 View

Broca’s area must be intact for the learning of new movement sequences?

When the eyes of a person are damaged this causes complete blindness. Likewise, when Wernicke’s and Broca’s areas of neocortex are damaged this causes complete aphasia, losing the ability to...

01 August 2024 6,744 2 View

How do living organisms play a role in the water cycle and why is nonpoint source pollution potentially more harmful than point source pollution?

01 August 2024 7,061 2 View

How does nonpoint source pollution affect ocean environment & why point source pollution is often easier to manage than nonpoint source pollution?

How does nonpoint source pollution affect the ocean environment and why point source pollution is often easier to manage than nonpoint source pollution?

01 August 2024 8,014 2 View

I am working on my Master's thesis on the biogeography of the genus Ruagea and I would like to ask, could someone help me to check whether my result?

I created a file with my outgroup and ingroup species using Beauti, ran it in BEAST, viewed it in Tracer, and then used TreeAnnotator to create a file that I imported into RASP. Could someone...

28 July 2024 2,979 1 View

Which software tools are best for enhancing diagnostic accuracy in chest X-ray imaging using image reconstruction and neural networks?

I am reaching out to seek your valuable advice and recommendations regarding the best software tools to use for this research. Specifically, I am looking for software with a user-friendly...

22 July 2024 3,794 1 View

How to test multivariate outlier in STATA?

Hey all, I need help testing for multivariate outliers using STATA for my master thesis. The literature recommends the Minimum Covariance Determinant (MCD) (Verardi & Dehon, 2010). I found the...

22 July 2024 8,821 2 View

Alexander I. Rudnicky

I'm sorry to hear that you had difficulties in getting the software to work for you. One of the downsides of open source is that you don't have a support organization there to help out, like you would for a commercial product.

On the other hand there's usually a user community that's willing to help and that over time has accumulated a fair amount of practical advice in its archives. cmusphinx on sourceforge has a very active community and very helpful people, in particular Nickolay Shmyrev.

One thing that often happens is that it's easy to under-estimate the amount of time it takes to get something properly working. People in the field claim that it takes about a person-year of effort to bring up a system in a new domain, and a background in ASR helps. I know that Sphinx is the basis for a number of successful commercial ventures, so it can be made to work.

If you're not altogether discouraged I would urge you to interact with the cmusphinx community and give it another try.

Dmytro Prylipko

First of all, I would like to thank you for your work: Sphinx is a great piece of software.

Also, without Nickolay's help I would not be able to implement a lot of things.

Indeed, I am quite satisfied with the support of the community. Most questions are already addressed on forums, and my own expertise always helps me in solving the rest.

However, my point is rather about purely technical aspects. For instance, in telephone-based IVR systems the bottleneck is often VAD and noise cancellation. However, Sphinx and UniMRCP do not provide front-end tools to address this issue. Well, Sphinx4 does have VAD (SpeechClassifier + SpeechMarker), but pocketsphinx plugin for UniMRCP does not.

Also, the methods of acoustic and language modelling are pretty standard: MFCCs/PLPs, HMM-GMM, N-grams. While these are good enough for majority of the tasks, we know that e.g. PNCC features are more robust against noise and DNNs can significantly improve the spontaneous speech recognition.

I have a feeling that those questions are better addressed in commercial ASR solutions (like Nuance), while open source community simply does not have enough resources to implement all the cutting edge techniques from the speech recognition area.

Can you as the expert comment on this? Maybe I overestimate the industrial toolkits?

Horia Cucu

Dear Dmytro,

I'm using CMU Sphinx for a while and I'm very satisfied with its performance and flexibility. I agree that the project is not sufficiently mature to be used out-of-the-box for any application, but the fact that the community (and especially Nicolay) is so willing to help everyone with their application is a huge advantage.

Regarding PNCCs (called "denoise" features in Sphinx): they are implemented and used by default since june 2013.

Regarding VAD: to the best of my knowledge, at the moment pocketsphinx benefits from a new, improved version of VAD, which will be implemented soon in Sphinx4 also.

I found the Denoise class in the Sphinx package, but is it plugged in by default in any configuration? Or I must explicitely include it into the processing pipeline?

I can see it is some kind of adaptation of the PNCC idea to the existing pipeline. Do you know whether the usage of Denoise + MFCC leads to the same effect as the usage of PNCC features? I can find almost no information about Denoise in internets...