What techniques are used to detect when a person begins to speak especially where there is ambient noise?

More Ron Gutman's questions See All

Hot to do MTT assay data analysis?

I have repeated an MTT assay three times using different batches of the same cell line. Each time, I tested the same treatment (different medium) and seeded the same number of cells. Additionally,...

08 July 2024 1,900 1 View

What is the best control for conditioned medium?

I'm planning a small research project involving conditioned medium from cancer cells on a human cell line (specifically KGN cells). I intend to expose the cancer cells to this medium for 72 hours...

06 May 2024 8,470 2 View

What core material to use for an ELF handheld antenna, Frequency < 50Hz?

I want to design an ELF handheld antenna to respond to frequencies from 5 - 50Hz. What would be the best economical core material to use? Also would you suggest method of measuring the...

25 April 2024 4,633 4 View

Is there an exact solution to this integral?

I am trying to exactly solve the following integral: Integral from 0 to infinity of dw[e^(-Tw^2)(wcos(aw)+Hsin(aw))(wcos(bw)+Hsin(bw))/(w^2+H^2)], where T, a, b, and H are real numbers that are...

13 February 2024 5,348 3 View

Can you suggest references for solving integrals in the complex plane that involve Bessel functions?

I am looking for references that discuss how to solve integrals in the complex plane that involve Bessel functions. Any suggestions would be very much appreciated.

13 February 2024 3,686 1 View

What are the most effective methods to address errors (false positive / false negative) in the clinical bacteriology laboratory?

Errors in clinical bacteriology laboratory can always be reduced but never avoided. With that, what steps can be done once an error has occurred?

22 February 2023 2,242 1 View

Calcein self quenching concentration in LUVs?

Hi! I am trying to measure fluorescence intensity (calcein self-quenching) as a way to measure changes in liposome volume as a response to hyperosmotic shock. My liposomes are made from PG and...

18 December 2022 9,756 1 View

How to make the contour plot match legend?

Hello all, I modeled a simple torque on a shaft. The results should be linear but my contour plot shows the change to have only occurred through the first element of the shaft and not the entire...

02 January 2022 6,726 1 View

When do BesselJ[nu, ax] and BesselY[nu, ax] cross over from their small-argument approximation to their large-argument approximation?

BesselJ[nu, ax] has as its small-argument approximation ([(ax)/2]^(nu))/Gamma[nu + 1], and has as its large-argument approximation (2/((pi)(ax)))^0.5*Cos[ax - (pi)/4 - nu(pi)/2]. BesselY[nu, ax]...

31 December 2021 6,866 1 View

Where can I compare gene expression levels for cancerous tissues vs. normal tissues?

ICGC/TCGA takes a sample of the cancerous tissue and a sample of the normal tissue for each patient. I would like to gather the Renal cancer patients of ICGC for example, and get the mean...

13 November 2021 225 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Hervé Lissek

Hi Ron,

Do you mean what technical means allows speaker recording in background noise, or voice activity detection in background noise?

In any case, if you know a priori the characteristics of the ambiant noise, and assume it to be stationnary, you can already try to substract the sampled noise signal to the total recorded signal (as it is statistically constant over time). The main drawback is that it also affects the speech recording quality in the overlapping frequency bands, as it does not discriminate noise from speech.

Alternative and more sophisticated solutions require a specific microphone design. The most common microphone for rejecting diffuse noise is the cardioid microphone, which is basically a directional microphone which sensitivity is increasing when the speaker is speaking in front of the microphone. The counterpart is a strong reinforcement of low frequency when the speaker gets closer to the microphone (proximity effect), thus limiting the recording performances in terms of intelligibility.

Another technique consists in employing an array of microphones, so that to increase the angular selectivity of the sound recording and filtering the contributions coming from other direction (for which a significant amount of diffuse energy can be rejected). An even more sophisticated solution employs 2D arrangement of linear sub-arrays of microphone, one linear array to catch the sound of the speaker (endfire configuration), another to catch the diffuse noise and substract it afterwards in the total array.

I hope it answered (at least partially) your question.

Kind regards,

Hervé

Akpan Jimmy Essien

Ron. It is not very clear what you really want to do. I am answering from the standpoint of what I understand.

Acoustically, speech is quite different from noise. The difference can help you know when the speaker begins to speak in ambient noise. As Harve has pointed out, microphones are magnetic tools and so they capture sound waves without discriminating between noise and speech. If the speech has already been recorded, you would need an oscilloscope (or any spectral analyzer) which gives you the waveform of the recording. Assuming that the recording process has been started with the ambient noise, the oscilloscope shows the ambient noise as a band of frequencies without any defined envelop. When the speaker begins to speak, the spectral structure of the signals will change; you will then find well-defined envelop representing vowel sounds and voiced consonants that are well distinguishable from the ambient noise. So, oscillograms should be right for the purpose.

However, I think that you will find it difficult to remove the background noise from your recording because the speech signals are "mixed up" with the noise. But because speech sounds are very resistant to distortion (which I cannot go into here), they retain their distinctive traits even in noise. You might try filtering, that is, you may know the frequency range of the ambient noise. You could apply a frequency pass-band and remove all the noise that is outside the frequency band of your recorded speech. If the ambient noise is within the frequency range of the speech signal, you'll have a struggle because you cannot remove the noise and leave the speech signal.

When you say that the ambient noise is known in advance to the algorithm, I tend to think that you are superposing the recordings. The ambient noise may be recorded in one track and the speech sound in a second track. In that case, you can keep them separate, but if you play the two tracks as a monaural recording, you will have the same problem as mentioned above trying to separate them.

That was just to give you some ideas.

Ron Gutman

Thanks Herve. Your comments help. I was mostly thinking about an acoustic signal processing approach like your first paragraph, but your suggestions about microphones are also helpful.

For now, more elaboration on the first approach would help. I only want to know how to detect the start of speech; no need to actually remove the ambient noise from the speech (although that would not hurt, but is not my main goal). Does that simplify the problem? I was thinking about subtracting some model of the noise? Is this better done in the frequency domain? What are the computational difficulties?

In another case, the background sound is more predictable, for example it might be music played by the same system so the system knows the exact signal as originally produced, it just doesn't know exactly what it sounds like at the microphone, volume, echo effects and any other effects created by the speaker, environment or the microphone. Does that make the problem easier?

Wikipedia seems to have a page on the subject:

http://en.wikipedia.org/wiki/Voice_activity_detection

What do you think about it?

Thanks much, Akpan. You posted at the same time as my reply to Herve. I'll also absorb your response and might have more questions.

[email protected] Aryanmini

you may also use an Adaptive Filter using LMS algoritham