What needs to be measured to compare two speech signals?

I guess you need to know more about speech recognition. A similar question was asked in SOF and people have given several recommendations. You may find it useful to refer to the following page:

https://stackoverflow.com/questions/10756214/how-to-compare-two-audio-data

Hope it helps.

Arturo Geigel

Anas,

The process of speech recognition is very complex and depends on may factors, for example:

1) If comparing female to male. There is usually a marked difference in frequency based on gender.

2) The frequency range of an individual- some individuals have a higher frequency range than others depending on many factors in their sound production apparatus

3) Whether there are anomalies in speech- some types of anomalies occur due to heavy drinking and forcing the vocal cords

4) Phoneme to word mapping depends on regional differences

5) background noise- causes the Lombard effect

6) Speed of utterance - causes coarticulation

These are some of the factors that come immediately to mind that would make the signal comparison useless without further processing(and even then).

Rasim Nabiyev

Need to measure the amplitude, phase and frequency of two speech signals and compare these characteristics.

Amir Hossein Poorjam

It sounds simple, but unfortunately, it is not!

There are many confounding factors that make this process complicated. I give you some examples: consider you have a recording of your own voice recorded in a sound proof room saying "OPEN THE DOOR", and you would like to use that recording as the reference to which other voice commands are compared to take an action to open the door, for example.

Now, if you utter the same utterance but in a noisy environment, the two recordings are no longer the same.
If you change the room and record it in a reverberant room, the two signals are no longer the same.
If you say the same sentence but in different speed (speech rate) as you uttered the reference one, the two signals are no longer the same.
If you utter the same sentence but in different rhythm as you uttered the reference one, again, the two signals are no longer the same.
If all or some of the above mentioned factors happen at the same time, again, the two signals are no longer the same.
Now, imagine that you want to compare your reference signal with another person's recording of the same sentence. If both recordings are recorded in a similar environmental condition (same room, same equipment) and the same rhythm and rate, again, the two recordings are not the same.
Age, gender, health condition are other confounding factors that influence the signal.

Considering the formants of the two signals and comparing them using some similarity measures could be a very simple and quick solution. But unfortunately, they do not provide a good result since, for example, the similarity measure of two completely different sentences recorded in a particular acoustic environment can be relatively higher than two roughly similar sentences recorded in different environment, or if in the second recording the speaker utters the similar words than the reference recording but in different order.

To deal with these factors and variabilities, you might need a model (such as hidden Markov model, Gaussian mixture model) to capture the acoustical characteristics of the signals (in some relevant feature space such as cepstral domain or time-frequency domain) and to relate the segments of a signal to the language unites, and also you need a language model to link the unites to recognize the sentence. All these procedures are covered under the speech recognition field.

Are air moisture harvesting technologies effective in combating desertification?

Dirty and clean?

Can anyone provide me with molecular docking softwares/ websites?

Can we patent a process flow diagram developed using a process simulator but no actual cases is carried out?

Gas chromatography RT detection?

PhD thesis topic?

Can anybody provide me the Matlab code to plot the attached picture (Time-Frequency Domain), please?

Who wants opportunities for scientific cooperation?

Who wants opportunities for scientific cooperation?

How can I determine the percentage of cell viability of cells seeded in 96 well plate ?

Problems about acousto-optic coupling in Comsol, how to check whether the moving mesh is correct?

How are iso-frequency contours plotted?

Broca’s area must be intact for the learning of new movement sequences?

Simulation of metal drawing by Abaqus with UMAT?

Does post-translational protein modification cause devisions on observed pI verses calculated pI?

Can a shoot-through event of a tri-state digital buffer cause momentary Hi-Z state?

What are the current challenges and future prospects of integrating artificial intelligence into recognition systems for autonomous vehicles?

Trial exclusion in eye-tracking data?

Help me download paper?

What will be the best way to test NFkb activation via western blot?