How do you calculate the number of true negatives using an automated acoustic classifier?

The question is interesting. I have been working on automatic detection many years, but I have never come across the need to assess "true false negatives" as you intend.

The explanation of your suggested approach is not entirely unclear to me, but it seems to be mistaken in several respects. I think you can find your answer another way.

If you have a frog-call detector that somehow discriminates a frog call from other noise (from bird calls or traffic noise and what not), or that discriminates a particular species of frog from other species, then that detector presumably has a detection-sensitivity adjustment. The adjustment probably amounts to the adjustment of one or more decision thresholds in your detection algorithm.

The only detectors that do not have such detection-sensitivity adjustments are expert systems, such as neural networks for instance, that are trained for detection in a once-and-for-all fashion, using validated populations target samples (frog calls) and non-target samples (ambient, spurious noise). The sensitivity setting for such detectors is for the most part implicitly “learned” from the relative sizes of the training populations, rather than adjusted at the time of use in the field. I always advise very strongly against using such detectors in practice, precisely because they have no field-adjustable sensitivity setting. In practice, the only time that you can get away without field-adjustable detection-sensitivity -- hence the only time that such expert systems might be of operational use as detectors -- is when the real-world field conditions are thoroughly under your control and they never change. That is rarely ever the case in reality. It is probably not the case for frog-call detectors in wetlands. So be sure that your frog-call detector is not a neural network or similar pre-trained expert system, and that it has a detection-sensitivity setting. If your detector is without a sensitivity adjustment, send me a note, and I we can discuss it further.

To answer your question then: Assume that you record audio in the vicinity of wetlands, and that you can replay it for your detector as many times as you like back in the lab. For one play of the recording you use a relatively low detection-sensitivity setting S1, and your detector counts N1 false positives in time T (i.e., N1 is the number of events that your detector identifies as frog-calls when they are not in fact frog-calls).

Now increase the detector sensitivity somewhat to S2, such that the expectation for false positives increases. Replay or continue to play the recording for the same time length T, and register the number of false positives N2. You should have N2>N1 because the detection-sensitivity is greater for S2 than S1.

The difference N2-N1 is the number of true false positives that were correctly rejected by your detector in the special case of going from sensitivity setting S2 to S1.

Now increase the detector sensitivity once again, to S3, and record the number of false positives N3>N2>N1. Keep on going this way, systematically increasing sensitivity S in steps, recording the data (S1,N1), (S2,N2), (S3, N3)… and so on, until the number of false positives N becomes the largest number that you care to count in time T.

Plot the points (S,N) on a graph. Draw a smooth-line fit (S,N) through the points. The value of N on this line at the maximum sensitivity S that you used is the number of true false positives.

If N is getting too large to count, then you might consider decreasing the time T of audio play to T’. Record N’ for this shorter time, and plot N = T’ X N’ / T on the graph. Then you can extent the line (S,N) to greater sensitivity S than you otherwise might have.

In any case, I think you may find that the events that constitute false positives are not quite what you expected them to be. This is because the detector hears the world through the particular feature set that it was designed to use for recognizing frog-calls amid noise, which is probably different from the feature set that you as a frog-call expert use to identify frog calls. It is this that makes the class of true false positives interesting. Looking at true false positives amounts to looking at (listening to) the world through the eyes (ears) of the detector, before it has made any decisions about what it is hearing. This will generally be very different than the way humans see (hear) the world. The set of true false positives may be another way to understand what the detector is doing in practice.

I hope that makes sense and helps.

Ronald

(PS: I lived many years nearby in Halifax.)

Benedikt R Schmidt

Some of the occupancy models that account for both false positives and false negatives might be useful, e.g.

Accounting for false-positive acoustic detections of bats using occupancy models

By:Clement, MJ (Clement, Matthew J.)[ 1 ] ; Rodhouse, TJ (Rodhouse, Thomas J.)[ 2 ] ; Ormsbee, PC (Ormsbee, Patricia C.); Szewczak, JM (Szewczak, Joseph M.)[ 3 ] ; Nichols, JD (Nichols, James D.)[ 1 ]

JOURNAL OF APPLIED ECOLOGY

Volume: 51

Issue: 5

Pages: 1460-1467

DOI: 10.1111/1365-2664.12303

Published: OCT 2014

Paul Crump

Thank you Ronald and Benedikt for your suggestions.

Why Do TDS and EC Increase with Larger Wastewater Volumes, While BOD and COD Decrease?

How to enrich pig excreta for increasing nutrient quality organically ?

Is it possible to plot the atom-projected band structure using GPAW?

Unusual intensity drop in some sections of chromatograms in DDA?

Leaf area of tomato ?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

How to preform densitometry on SDS-page bands?

XRD Analysis is showing only Calcium carbonate. It is not showing other compounds. Can anyone help me get the other compounds?

Which solvent is better to dissolve with secondary metabolites extracted from fungi?

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

What are the key methods and indicators used in assessing the biodiversity of river ecosystems, and how do these methods account for variations ?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

How to report results of Generalised Linear Mixed Models in a journal article?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Posthoc test lettering in JAMOVI?

How microorganisms are important for maintaining of healthy soil and biodiversity and microorganisms and plant roots contribute to soil formation?

Which statistical test should we use?