What are the best practices for selecting environmental variables in ecological niche modeling for coral reef fish species?

A priori selection based on a mechanistic explanation of a species niche "preferences" is advised. Selection procedures via p-values (backward and/or forward selection), dropping "insignificant" variables, AIC (or BIC) are not proper statistical approaches for selection and are all somehow "related" so result in problematic conclusions see the sources (https://royalsocietypublishing.org/doi/10.1098/rspb.2023.1261, https://www.tandfonline.com/doi/full/10.1080/09332480.2018.1549817 or see F. Harrell's book

https://link.springer.com/book/10.1007/978-3-319-19425-7).

A niche could be defined as some abstract multivariate space constructed by univariate vectors under which a taxon was observed. I see no need to select a variable perse, because a niche does not imply "causality", unless defined otherwise ofcourse. If you have too many variable you could start by addressing why a particular variable would be of interest for a species occupation of this space. Ofcourse in ecology we often deal with indirect relations so one could "explain away" every relation and by some model selection it tends to become tricky and we likely end up with a spurious relation (e.g., https://www.sciencedirect.com/science/article/pii/S0022316622012196).

I would keep it straightforward and focus on those of "direct" relations. I never read something about reef fish, but I assume reef structure, algae biomass and specific food source or oxygen, might be of interest. Iam much more interested in your knowledge and ideas of how such variables shape the niche given a qualitative good introduction en mechanistic explanation then why a "mindless" model wold select it.

Best,

Rainer Ferdinand Wunderlich

1. Knowledge of species ecology

2. Knowledge of available data

3. Consider everything that could be meaningful, i.e., the overlap of 1. & 3.

4. usdm package in R: Remove variables from highly correlated pairs, while retaining the one with the lower VIF. You can (and should if you really know it!) whitelist variables that have to be in the model. Don't drop too much!

5. Calibrate preliminary models and look at the marginal variable contributions but also at those of interactions (if any included).

6. Reflect on your results and drop non-informative variables in the final models; or revisit steps 4. and onwards keeping the alternatives to what is dropped later anyway.

7. Always aim to include direct effects and limite those that are likely only effective via correlations with true variables. Totally, agree with Wim Kaijser here! There are papers by Austin on this topic! That being said, landscape metrics do work very well in many cases and distance variables (e.g., to water) are still underutilized :)

Please note, that there are almost infinite ways how to do this, and no method will be superior in all cases. Also what is "superior" would depend on your objectives (ecological explanation or predictive accuracy) and the modeling approach (classic stats or machine learning).

I would like to have this article

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

How to quantify polystyrene microplastic (8 micron) bioaccumulation in fish tissue?

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?

Is it necessary to covary exogenous constructs in a structural model?