I have diurnal, non normal bird count data from core(one trail) & buffer (five trails) region of a PA and want to test the variation. Which test will be the best ( parametric or non parametric) ?.
I am not familiar with your subject/application. Are you trying to compare two different distributions?
As for normality, that is usually not expected, unless you are looking at the distribution of errors or estimated regression residuals (really random factors of those estimated residuals for WLS), or are making use of the Central Limit Theorem.
If you have two different means, and standard errors, and distributional approximations (such as beta, or lognormal), you might try to get a confidence interval around the difference in the means, but I have not really worked with count data for many years that I recall, if ever, like this. As for nonparametric (distribution-free) statistics, you could fall back on Chebyshev's Inequality.
At any rate, if a confidence interval is appropriate, it can be far better interpreted than using an hypothesis test. But if you go with an hypothesis test, remember the need for a power (type II error) analysis, not just a p-value.
Sir, thank you for your response. I have formulated the null hypothesis prior to my field work as "There is no variation in the abundance of bird species in core and buffer region of a certain protected area". I have collected the bird count data(Fig attached, shown part of the full data set) and want to prove that there is significant variations regarding abundance of bird species especially in the core zone as it is less disturbed. variable NPZB is the bird count data for buffer zone and NPZC is for core zone. NPZ constitutes the zones where the birds were observed(Z1-buffer(19 species), Z2=core(13species), Z3= core+buffer, 13species). Point to note that some species overlapped and thus Z3 is tabulated. Thus the total data set structure is as follows 45 rows(species)X 3 columns(NPZB,NPZC and NPZ). Which test to prefer for testing the significance a) paired t test b) two-way anova or c) Kolmogorv-Smirnov test or anything else? Kindly suggest. Thank you again for your kind response.
I am not completely clear about your data, but to be able to look at variation implies multiple observations for a given situation. For example, if you had two areas to compare and had number of sightings to compare in one case, that is only one pair of such observations, and you can tell nothing other than "that happened on that day." If you did this for multiple days, then you would have some information to check mean differences and variances, but it would be impacted by the fact that obtaining enough data may take you across a season, and studying that effect would be more complicated and require years of daily data, I think. Still, if you can just caveat that fact, and ignore season, and you have daily comparisons to use for data, for a large enough sample, then you could compare those means and standard errors. ("Large enough" depends upon the variance.)
If you only have one datum (count) for each case - each zone and species - then you only have anecdotal evidence. That would mean that you collected only one number (which might be called an "observation," though here that could be confusing because one "observation" here is the number [count] of bird observations [same English word used differently]). So here we will talk about one datum as one count among your data.
From your anecdote (if I am understanding correctly that that is what you have), you can report the numbers (counts) that you found on that one occasion, if I am correct that that is what happened. That is, if I understand your situation correctly, then you may say that on this occasion, these are the count data collected, though it could vary substantially on other occasions; there is no information on variance. (There is also no information on bias. You could find a problem in how you collected your data. It is best to keep a detailed record of your procedures. You might also want to research the terms "metadata," and "paradata.")
But in addition, let's look at your question as to the "...variations regarding abundance of bird species especially in the core zone as it is less disturbed." Does that mean you might want to compare the distribution of the 13 counts you found in the core zone (is that correct?) with the distribution found for these species in other unprotected areas? You might at least be able to do histograms here. These might be paired bars for each species with the number you saw on one bar compared to the number expected in unprotected areas of the same size.
Also, if you are going to compare a buffered zone to a core zone, are they the same size? I mean is the geographic area in each case (i.e., for each zone) the same number of square kilometers? Otherwise, comparing the counts would not be meaningful. If you wanted to compare buffer and core zones of the same size by comparing count in one to count in the other by species, that might be of interest, especially if count of a given species in the core zone, minus count of the same species in the buffer zone, for equal size zones, were always positive. That might indicate that the core zone concept is working. You could consider this a yes/no question. Is the count higher in the core zone for a given species or not? That would mean looking at a proportion. This would be somewhat crude, but you could get an overall proportion for such successes, with a standard error for that proportion. So, you could research proportions and standard errors of proportions.
A lot of this information might be used as if this were a partial pilot test/study to see how you might want to conduct a more thorough statistical study in the future.
The most straightforward way to answer your question would be to specify a generalised linear model, with Poisson error and log link, where the mean or expected count is a function of whether the trail is in the core or buffer (i.e. a binary predictor variable, you can code core as 1 and buffer as 0, or vice versa). The input data will thus be your counts per trail and the binary predictor. Further, in case your trails are of different length, do include log(length) as an offset term. This can be easily implemented in R usig the glm function.
Whether you choose to use hypothesis testing after this (you could look at the test statistic associated with the estimated beta coefficient) or model selection (e.g. run another model where count is a constant term...an intercept only model, or where count is a function of other predictor variables, and then compare AIC values among models) is up to you, but I would recommend against hypothesis testing for purely observational (as opposed to experimental) studies.