Basic text to speech technique?

More Sukesh Peruri's questions See All

Harmonic mitigation by mathematical analysis in zig -zag transformer ?

Mathematical analysis of current waveform

25 April 2017 388 2 View

Is it necessary that the inhibitors with the same mechanism of action must be used for generating the ligand based common pharmacophore model?

Seeking an explanation.

10 January 2014 6,290 0 View

Any outcomes in Magnetorheological fluid clutch in to market?

I had worked for a Magnetorheological fluid clutch project, and its on half way, I had developed mass matrix using MATLAB. I wanna continue in this topic. Are there any designs?

06 October 2012 9,467 1 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Is there an English Translation of the Carl Moller text: ZUR VERGLEICHENDEN ANATOMIE DER SILURIDEN?

I recently came across an anatomy text by Carl Moller that was published in 1915 but it is in German or Dutch neither of which I can understand. I would like to know if there is an English...

10 August 2024 4,347 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended?

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended? If no axillary lymph node dissection was not...

05 August 2024 8,056 1 View

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity?

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity? What is the acceptable percentage of error (regardless of the metric)? Could you suggest...

03 August 2024 5,358 0 View

Interested in a SCOPUS collaboration?

Hi RG family. My team and I are working on some SCOPUS publications and we need co-authors who are willing and capable of undertaking both qualitative and quantitative-based studies. The scope...

02 August 2024 7,843 0 View

Interested in a SCOPUS collaboration?

Hi RG family. My team and I are working on some SCOPUS publications and we need co-authors who are willing and capable of undertaking both qualitative and quantitative-based studies. The scope of...

02 August 2024 8,572 0 View

Krzysztof Wołk

Hahe you tried https://github.com/hcarver/phonemic ?? or https://github.com/s-macke/SAM ??

Lyes Demri

The way I did it was inspired by an algorithm I found in a book titled DAFX by Zölzer (2002). Basically you calculate the cross-correlation between the end of the first phoneme and the beginning of the second and find the point where they correlate the most. This is where you want to concatenate the phonemes/diphones/units. Based on this point, you multiply the end of the first phoneme by a decreasing ramp and the beginning of the second phoneme by an increasing ramp and then just add them up. I hope you understand the idea.

Sukesh Peruri

@Lyes Demri

I understood your idea but while concatenating phonemes of the word, it doesn't sound like pronouncing a word rather its sounds like reading phonemes one after the other. So, I think if I am able to play the signal fast, it will sound somewhat natural to original word. So, I tried to increase the sampling rate while playing, but its losing the quality. So how to proceed with this concatenation approach?

@Sukesh

Did you implement the method I suggested? because in principle if you use it and the phonemes are taken from relatively similar contexts it should work okay. Here are some sentences I produced by concatenating diphones exactly how I suggested:

http://voice-research.org/synth/synth.php

For the faster speaking rate, it should be better to use shorter units (phonemes in your case) or use the PSOLA algorithm, I think you are familiar with it.

(I've sent you the Matlab file that I use to concatenate diphones; you can read it and try to understand exactly how it works)