Scenario: I have two datasets. The first is a sample dataset of "observed" values of a spatial variable (eg elevation) extracted for n=30 sites being studied. The second is an "expected" dataset with the values of the same variable for the entire study area; about n=1000 randomly sampled data points in all which reflect the natural distribution of the variable.

I wish to test if the "observed" dataset consists of values that would be expected if the sites were distributed randomly in the study area in relation to the variable. I wish to assess whether sites exhibit a preference for values of the variable that are different to the expected values given the natural distribution - and whether any such preferences are significant. Essentially: at what values of the variable do the samples statistically significantly vary from the natural distribution of the variable?

My desired output is a graph similar to that attached below. The histogram represents the percent of sites at each value of the variable; the red line represents the natural distribution and confidence interval. At any point in which the histograms lie outside the confidence interval, they can be assumed to be significant. I expect the confidence interval to change depending on the size and distribution of the two input datasets ("observed" and "expected").

What would be the best way to go about producing results like this from my data? I feel a bootstrap/non-parametric method might be the direction I need to head, but I am currently unsure on the specific method to implement. Any assistance/ideas/suggestions would be greatly appreciated!

More Robbi Bishop-Taylor's questions See All
Similar questions and discussions