How do I test the variations in birdcount data effectively?

Santanu -

If you have two different means, and standard errors, and distributional approximations (such as beta, or lognormal), you might try to get a confidence interval around the difference in the means, but I have not really worked with count data for many years that I recall, if ever, like this. As for nonparametric (distribution-free) statistics, you could fall back on Chebyshev's Inequality.

At any rate, if a confidence interval is appropriate, it can be far better interpreted than using an hypothesis test. But if you go with an hypothesis test, remember the need for a power (type II error) analysis, not just a p-value.

https://www.researchgate.net/publication/262971440_Practical_Interpretation_of_Hypothesis_Tests_-_letter_to_the_editor_-_TAS

Jim

Article Practical Interpretation of Hypothesis Tests - letter to the...

Santanu Gupta

Sir, thank you for your response. I have formulated the null hypothesis prior to my field work as "There is no variation in the abundance of bird species in core and buffer region of a certain protected area". I have collected the bird count data(Fig attached, shown part of the full data set) and want to prove that there is significant variations regarding abundance of bird species especially in the core zone as it is less disturbed. variable NPZB is the bird count data for buffer zone and NPZC is for core zone. NPZ constitutes the zones where the birds were observed(Z1-buffer(19 species), Z2=core(13species), Z3= core+buffer, 13species). Point to note that some species overlapped and thus Z3 is tabulated. Thus the total data set structure is as follows 45 rows(species)X 3 columns(NPZB,NPZC and NPZ). Which test to prefer for testing the significance a) paired t test b) two-way anova or c) Kolmogorv-Smirnov test or anything else? Kindly suggest. Thank you again for your kind response.

Best Santanu

James R Knaub

Santanu -

I am not completely clear about your data, but to be able to look at variation implies multiple observations for a given situation. For example, if you had two areas to compare and had number of sightings to compare in one case, that is only one pair of such observations, and you can tell nothing other than "that happened on that day." If you did this for multiple days, then you would have some information to check mean differences and variances, but it would be impacted by the fact that obtaining enough data may take you across a season, and studying that effect would be more complicated and require years of daily data, I think. Still, if you can just caveat that fact, and ignore season, and you have daily comparisons to use for data, for a large enough sample, then you could compare those means and standard errors. ("Large enough" depends upon the variance.)

Jim

James R Knaub

Santanu -

If you only have one datum (count) for each case - each zone and species - then you only have anecdotal evidence. That would mean that you collected only one number (which might be called an "observation," though here that could be confusing because one "observation" here is the number [count] of bird observations [same English word used differently]). So here we will talk about one datum as one count among your data.

From your anecdote (if I am understanding correctly that that is what you have), you can report the numbers (counts) that you found on that one occasion, if I am correct that that is what happened. That is, if I understand your situation correctly, then you may say that on this occasion, these are the count data collected, though it could vary substantially on other occasions; there is no information on variance. (There is also no information on bias. You could find a problem in how you collected your data. It is best to keep a detailed record of your procedures. You might also want to research the terms "metadata," and "paradata.")

But in addition, let's look at your question as to the "...variations regarding abundance of bird species especially in the core zone as it is less disturbed." Does that mean you might want to compare the distribution of the 13 counts you found in the core zone (is that correct?) with the distribution found for these species in other unprotected areas? You might at least be able to do histograms here. These might be paired bars for each species with the number you saw on one bar compared to the number expected in unprotected areas of the same size.

Also, if you are going to compare a buffered zone to a core zone, are they the same size? I mean is the geographic area in each case (i.e., for each zone) the same number of square kilometers? Otherwise, comparing the counts would not be meaningful. If you wanted to compare buffer and core zones of the same size by comparing count in one to count in the other by species, that might be of interest, especially if count of a given species in the core zone, minus count of the same species in the buffer zone, for equal size zones, were always positive. That might indicate that the core zone concept is working. You could consider this a yes/no question. Is the count higher in the core zone for a given species or not? That would mean looking at a proportion. This would be somewhat crude, but you could get an overall proportion for such successes, with a standard error for that proportion. So, you could research proportions and standard errors of proportions.

A lot of this information might be used as if this were a partial pilot test/study to see how you might want to conduct a more thorough statistical study in the future.

Best wishes - Jim

Devcharan Jathanna

Dear Santanu,

The most straightforward way to answer your question would be to specify a generalised linear model, with Poisson error and log link, where the mean or expected count is a function of whether the trail is in the core or buffer (i.e. a binary predictor variable, you can code core as 1 and buffer as 0, or vice versa). The input data will thus be your counts per trail and the binary predictor. Further, in case your trails are of different length, do include log(length) as an offset term. This can be easily implemented in R usig the glm function.

Whether you choose to use hypothesis testing after this (you could look at the test statistic associated with the estimated beta coefficient) or model selection (e.g. run another model where count is a constant term...an intercept only model, or where count is a function of other predictor variables, and then compare AIC values among models) is up to you, but I would recommend against hypothesis testing for purely observational (as opposed to experimental) studies.

Hope this helps!

Short Synthesis of Graphene Oxide from Natural Graphite Flakes?

Some new emerging problems on application of RL for scheduling in IoT networks?

How to make soil water characteristic curve using centrifuge?

What is the degassing temperature and time for biochar for BET surface area analysis if TGA is not possible?

Can we analyse RMSD for peptide-HLA docking and binding affinity for peptide-HLA to TCR docking, in a peptide-HLA-TCR docking ?

How to get discharge capacity of a bole well?

How to apply an external electric field in gaussian 16 or Material studio DMOL3 ?

How can I precipitate a protein from condition media ?

Ni-NTA protein purification - how to obtain higher yield concentration?

In sds page and western blot, non specific expressed protein with very thin band is also there along with thick band of specific protein. why ?

Weak DAPI staining after immunohistochemistry - how to improve?

The Curse of Evolution and Complexity?

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

I am working on my Master's thesis on the biogeography of the genus Ruagea and I would like to ask, could someone help me to check whether my result?

Could you try using PeptiCloud and see if it's a useful tool for biology research?

Do you know of any online international conferences that offer free discussions?

Illustra™ MicroSpin™ G-25 columns what it is used for?

How do we pick data for determination of Validation Acceptance Criteria?

Pink bacterial colonies?

How are clinical and environmental yeast/fungal strains stored usually in mycology laboratories?