How does phonetic segmentation help ASR in bringing down the complexity from recognition to classification?

More Mohammed Nasar Ibnu Ibrahim's questions See All

Is skin yellowness an numerical or ordinal variable?

I have a response variable called skin yellowness, which I will measure via a scored color chart, whereby 1 is pale yellow and 15 is orange. I'm not sure if this counts as an ordinal variable,...

11 August 2024 4,793 1 View

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

Approximate concentrations are require in compared with the WHO permissible limts

11 August 2024 2,723 1 View

How can I get old refernce in geomorphological mapping?

how can I read Waters, R. S. 1958. Morphological mapping. Geography 43 :10-17 from internet? note: not in google or resaearch gate

26 July 2024 7,813 3 View

How can the rubber fibres produced by the electrospinning device be removed from the fibre collect(aluminium foil) ?

How can the rubber fibres produced by the electrospinning device be removed from the fibre collector (aluminium foil) without affecting the orientation of the rubber fibres for use in reinforcing...

26 July 2024 8,281 0 View

My question concerns MTT cell viability assay?

Despite not having cells in the media, I am getting purple color. I have tried many troubleshooting methods, varying media types, and even different MTTs from different companies to figure out the...

21 July 2024 9,914 1 View

Guidance needed for preparing the hydrogel samples for the XRD instrument?

I'm working with a hydrogel sample and I'd like to perform XRD analysis. Can anyone offer guidance on preparing the hydrogel for the XRD instrument? Specifically, I'm unsure about the best method...

20 July 2024 3,611 4 View

What are the modern topics to work under bacteriocin and fermentation as PhD student?

As PhD student specialized as food and industrial microbiology need to further research on bacteriocin and fermented food.

16 July 2024 7,394 1 View

How to get a research question in educational administration and management in MPhil?

What are some of the areas to consider in undertaken research in educational administration in MPhil.

14 July 2024 8,394 2 View

How can I extract the mathematical equation from existing Neural Network Model?

There exists a neural network model designed to predict a specific output, detailed in a published article. The model comprises 14 inputs, each normalized with minimum and maximum parameters...

14 July 2024 2,714 3 View

How can i elucidate a compound using xrd?

XRD analysis

08 July 2024 977 4 View

Are there any commercially available Donkey anti-Alpaca secondary antibodies?

Are there any fluorescently labeled anti-Alpaca secondary antibodies raised in Donkey? So far I have only been able to find anti-Alpaca secondaries raised in Goat. Or is this not possible due to...

04 August 2024 4,255 1 View

Can I use Polyjet after its expiration date?

I have a Polyjet that has passed its labeled expiration date for 1 year and I'm going to re-start transfection experiments. It is really needed to change it? Transfection efficiency may be lower?

23 July 2024 8,059 1 View

Why is there a significant edge deviation in radar point cloud and camera registration?

The above are manually labeled extrinsic matrices based on the first image It can be seen that the projection error at the edge is large, while the error at the center is small. What could be the...

23 July 2024 7,479 3 View

What are the current challenges and future prospects of integrating artificial intelligence into recognition systems for autonomous vehicles?

This question aims to explore the intersection of artificial intelligence and autonomous vehicle technology. It seeks to identify the key challenges faced in implementing AI for recognition...

20 July 2024 3,469 2 View

Help me download paper?

I have 2 papers below, but I can't access this, you can help me? Shuai Zhang, Xiaodi Li, Xingyu Zhou, Yuning Wang, Yue Hu, Cloud removal using SAR and optical images via attention mechanism-based...

18 July 2024 9,635 0 View

How to label synapses in over-fixed mice brain sections (40 um) via immunohistochemistry (IHC)?

We have mice brains that were over-fixed due to old PFA used during perfusion. Thus, the synapses are no longer being labeled by the Synaptophysin (SY38) mAB. which works perfectly every other...

17 July 2024 7,767 3 View

Increasing Area Ratio on Quality control Standard on LCMS ?

Hello everyone. During my analyses on LC MSMS, the area ratio (Standard Area/Isotopically labelled Standard Area) increases with time for the quality control point. I'm having trouble explaining...

08 July 2024 3,150 2 View

Is the wash in my IP the problem or something else?

The image shows 2 western blots. The first one is of protein A, and the second is for the flag tag that the protein is bound to. The lanes are as follows Ladder, Beads, SAB(unbound), and elution....

01 July 2024 3,183 0 View

What is the difference between opportunity recognition in entrepreneurship literature and sensing in dynamic capabilities theory?

While I do have some opinions on how to address this question based on my reading as student, I would like to know the opinions of more accomplished scholars.

28 June 2024 802 3 View

I am working on a network for facial expretion recognition and I have problem with the loss function can anyone help?

I am using dice loss and wing loss for loss function and my network outputs are heatmaps and landmarks and I am trying to train on both of them at a same time do you guys know how to solve this...

22 June 2024 10,013 2 View

Klaus Schuricht

I have a problem to understand your question. -

About segmentation: I think without knowing the words and without knowing how they are written, you cannot be quite sure how to do the segmentation.

Francisco J. Valverde-Albacete

"Acoustic decoding" ( aka "acoustic recognition") is precisely this segmentation + classification process. But ASR (automatic speech recognition) at present involves much more than this. Which amounts, more or less, to seeing whether the labelled segments you obtained make sense within a particular language (or rather a sample of a particular language).

Hope this helps.

A.G. Ramakrishnan

I completely differ from Klaus Schuricht. This has been the traditioinal view for almost 30 years and now, speech recognition has come to a dead end. Evidence? Nuance sells separate ASR systems for Americal English, British English and Indian English. If people do not attempt a radically different approach, Nuance needs to come out with versions for children, old people, et al. even for English and this will be a never ending stuff.

The only possible solution is language independent, and vocabulary independent segmentation of phones, and then the application of the phonological constraints of the particular language or languages, vocabulary, semantic analysis, etc.

The current ASRs start with a huge vocabulary, and match the sequence of feature vectors derived from the input speech to a sequence of words picked up from this stored vocabulary, maximizing the posterior probability, using n-gram statistics of the words, etc. But it is generally BLIND to the sentence structure, syntax and semantics. Thus, this technology is anything but scalable.

Mohammed Nasar Ibnu Ibrahim

Sir,

Thank you for your reply.

So then, what is the gain of doing phonetic speech segmentation for ASR, when it is blind to the sentence structure, syntax and semantics?

Dear mohammad,

It is left to you and me to do a great job and provide all these additional capabilities to make it a REAL recognition system. Unlike human beings, the current ASR systems do not "recognize" the speech. They simply do a best approximation transcription.

Further, most Indian speech is NOT monolingual. We mix two or more languages in a single utterance. To handle such multilingual utterances, the only way out is to do phonetic segmentation and then apply higher level knowledge.

Bayanduuren Damdinsuren

In ASR acoustic signal are segmented in duration 10 - 15 ms. These parts are called frames. They have overlapping (neighboring 2 frames are overlapping) and extracted features (MFCC) from each frame.