A priori selection based on a mechanistic explanation of a species niche "preferences" is advised. Selection procedures via p-values (backward and/or forward selection), dropping "insignificant" variables, AIC (or BIC) are not proper statistical approaches for selection and are all somehow "related" so result in problematic conclusions see the sources (https://royalsocietypublishing.org/doi/10.1098/rspb.2023.1261, https://www.tandfonline.com/doi/full/10.1080/09332480.2018.1549817 or see F. Harrell's book
A niche could be defined as some abstract multivariate space constructed by univariate vectors under which a taxon was observed. I see no need to select a variable perse, because a niche does not imply "causality", unless defined otherwise ofcourse. If you have too many variable you could start by addressing why a particular variable would be of interest for a species occupation of this space. Ofcourse in ecology we often deal with indirect relations so one could "explain away" every relation and by some model selection it tends to become tricky and we likely end up with a spurious relation (e.g., https://www.sciencedirect.com/science/article/pii/S0022316622012196).
I would keep it straightforward and focus on those of "direct" relations. I never read something about reef fish, but I assume reef structure, algae biomass and specific food source or oxygen, might be of interest. Iam much more interested in your knowledge and ideas of how such variables shape the niche given a qualitative good introduction en mechanistic explanation then why a "mindless" model wold select it.
3. Consider everything that could be meaningful, i.e., the overlap of 1. & 3.
4. usdm package in R: Remove variables from highly correlated pairs, while retaining the one with the lower VIF. You can (and should if you really know it!) whitelist variables that have to be in the model. Don't drop too much!
5. Calibrate preliminary models and look at the marginal variable contributions but also at those of interactions (if any included).
6. Reflect on your results and drop non-informative variables in the final models; or revisit steps 4. and onwards keeping the alternatives to what is dropped later anyway.
7. Always aim to include direct effects and limite those that are likely only effective via correlations with true variables. Totally, agree with Wim Kaijser here! There are papers by Austin on this topic! That being said, landscape metrics do work very well in many cases and distance variables (e.g., to water) are still underutilized :)
Please note, that there are almost infinite ways how to do this, and no method will be superior in all cases. Also what is "superior" would depend on your objectives (ecological explanation or predictive accuracy) and the modeling approach (classic stats or machine learning).