Dear all
I am faced with a question on selecting species occurrences as input
data for a biomod run. Specifically, I have a number of beaver (/Castor
fiber/) observations which I want to use to model future beaver range
expansion. The data amount to about 1.800 occurrences, representing 72
territories. The number of occurrences per territory varies between 1
and 65 (mean: 25). There are two obvious choices I can make: 1) use all
1800 data, increasing sample sizes -- but running the risk that the
results will (too?) strongly be influenced by the territories with a
large number of occurrences. 2) only use 1 occurrence per territory,
allowing an 'equal weight' for each territory -- but reducing sample
size (i.e. reducing how good territories are 'sampled').
An alternative would be to do multiple model runs whereby I randomly
select 25 occurrences (the mean number) from the territories with > 25
observations while using all available occurrences for the other
territories. Another way would be to weight or scale the occurrence data
-- for example downweighting the influence of occurrences belonging to a
territorial with a large number of occurrences.
While I can implement standalone R scripts to sub select/downweight
data, I am not sure how to feed this into the biomod flow. Any
suggestions on how to tackle this are much appreciated!
Best wishes and thanks in advance,
Diederik
-- Dr.Diederik Strubbe Evolutionary Ecology Group Department of Biology