How can I assess SDM performance under different conditions?

My suggestion: paying attention to the selection of the predictors/co-variates/environmental variables may be equally important as the algorithms.

My predictions: several algorithms will do very well with this object (beech forest in mountainous terrain). Interesting would be the transferability of a SDM from the Alps to the Apennines and vice versa.

looking forward to your findings

Maurizio Marchi

Hi, I agree with your suggestion about the the selection of the predictors/co-variates/environmental variables. Indeed I will be testing also this. I'm currently working with 38 predictors (37 climatic factors + soil) testing the use of the PCA components versus a subset of predictors selected with after a collinearity test. I think that the PCA could be a very interesting choice with a lower degree of arbitrary selection (i.e. 99% of the total variance can be compressed into just 8 components). Even if the importance of predictors will be much more difficult to be assessed, my goal is to have a good model, and I'm not interested in variable's importance.

I will keep you updated.

Giorgio Vacchiano

For the comparison, you can try equivalence testing (R package: https://cran.r-project.org/web/packages/equivalence/equivalence.pdf), whereby the null hypothesis is set as to be 'models are NOT equivalent'.

Hein Van Gils

seasonal incoming solar radiation + elevation + forest/non-forest maybe the "winning' parsimonious set of predictors; both calculated from the DEM.

Oliver Gutiérrez Hernández

First of all, I would define what is a good model. I guess it migh relates to your aims. Personally, I do not like the use of PCA in Species Distribution Models, but I recognize that PCA is very interesting to maximizing the use of predictors. For me, as ecologist, the algorithm is the less important thing because of I am not computer science developer. There is a major aspect influencing the results: each algorithm interacts with the predictors in a variety of ways according to the kind of predictors and sampling data. In this point, it is very critical the discussion about the importance of variables because of the spatial prediction is quite related to this question. Finally, whether you're interested in the spatial prediction for natural resource management in a climatic change scenario... the challenge may be the discussion about the influence of changing factors.This discussion is more complicated to address by using PCA. This is just my own view to begin.

References:

Mod, H. K., Scherrer, D., Luoto, M., & Guisan, A. (2016). What we use is not what we know: Environmental predictors in plant distribution models. Journal of Vegetation Science, 1–15. http://doi.org/10.1111/jvs.12444

Qiao, H., Soberón, J., & Peterson, A. T. (2015). No silver bullets in correlative ecological niche modelling: Insights from testing among many potential algorithms for niche estimation. Methods in Ecology and Evolution, 6(10), 1126–1136. http://doi.org/10.1111/2041-210X.12397

Thorson, J. T., Scheuerell, M. D., Shelton, A. O., See, K. E., Skaug, H. J., & Kristensen, K. (2015). Spatial factor analysis: A new tool for estimating joint species distributions and correlations in species range. Methods in Ecology and Evolution, 6(6), 627–637. http://doi.org/10.1111/2041-210X.12359

Desalegn Chala

Interesting that you are using PCA and you also consider collinearity test (I usually retain only one variable from those which are correlated > 0.7). Here I am also curious about the 9 algorithms and the kind of model settings you are using. For example the default MaxEnt is very complex and sub-optimal in performance (Halvorsen, 2013; Halvorsen, Mazzoni, Bryn, & Bakkestuen, 2014).

Moreover, models are highly data dependent. How representative your data is from altitude and aspect point of view (if your study area is mountainous). Does your study area covers large range of latitude in which case your data should be representative from latitude point of view too? Do you have independently collected presence absence test data for your target species? Rather than splitting your data in to test and training (though you are using a cross-validation of 50), having independently collected presence absence test data is important to make comparison among algorithms. Several authors have made such a study earlier. how your study will be different? So far MaxEnt and BRT are reported to perform better than others. GAM performs better than GLM and random forest performs better than classification tree.

My last point is, for species which is in equilibrium with its niche requirement within the study area, if data is perfectly collected, model algorithms and model settings matter less (see section 4.3 of Chala et al 2016).

Andrew Townsend Peterson

Some good comments so far, but here is a slightly different perspective. You should assess your models in terms of statistical significance (i.e., does, or how often does, a particular method achieve predictions of independent data subsets that are better than random exceptions). The approach most appropriate for niche modeling is partial ROC (see pub attached). Then, separately, you should include a performance measure that responds directly to the uses to which you wish to put the model. Different uses will demand different metrics of performance (e.g., omission rate, commission rate, correct classification rate, etc.). Hope these general thoughts help. ATP

Which indicator for goodness of fit?

Which reference manager do you use?

How to perform a Stepwise Fisher's Linear Discriminant Analysis in R?

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Measuring the Intelligence of a Species?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

"A Markov-like Model for Patient Progression"?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?