Best approach for model selection?

14 April 2022 5 9K Report

I am doing a study where I am trying to model how different factors affect polar bear movement. I would like to conduct model selection using AIC. So far, I believe I have two options:

1) Put every variable I am studying into one giant model, and then conduct step-wise model selection. What I would design would be a multi-variate model like this:

Movement variable = x1*(length of the ice season) + x2*(mean temperature) + x3*(year) + etc.

2) Construct a few different models with different “themes” like so:

- “Temporal” model: movement = x1*(length of ice season) + x2*(year) + x3*(start of ice season)

- “Biological” model: movement = x1*(age of bear) + x2*(number of cubs) + x3*(Sex of bear)

- “Environmental” model: movement = x1*(temperature) + x2*(ice concentration)

Then, conduct AIC selection for each model individually. After that, take the variables that came out as significant, and then combine them into one final model. So for example, if length of the ice season is significant for the temporal model, age is significant in the biological model and temperature is significant for the environmental model, my final model would be:

Movement = x1*(length of ice season) + x2*(bear age) + x3*(temperature)

My current thought is that option 1 makes more sense and has less steps. However, I am testing for a LOT of variables (over 10) and have a small sample size (about 25 bears), and I have seen other students in my lab do this.

Which approach makes the most sense statistically?

Koen Van de Moortel

Length of ice season and temperature are correlated. You can't just treat them as independent. I guess you better take the lenght of the ice season, since the difference between -5°C and -10°C may not mean much, but having ice or no ice might be more important. And why +? I would first study the variables separately to see how they behave and then try to make a total model.

Interesting problem! If you could send me the data, I could take a look and maybe get inspired...

Firdos Khan

Dear Jodouin,

You can use option 1 for model selection as you said, however, I suggest to see variable(s) selection in regression models using Bayesian Model Averaging (BMA). For this purpose, you can see the following paper:

Article Evaluation of CMIP5 Models and Ensemble Climate Projections ...

I hope this will help you.

Good Luck!

David Eugene Booth

I'm going to suggest the approach in the first attached. paper. Your regression method depends on the DV and probably not logistic regression. Everything else should work for you. The second attachment is a program that is easier to use. Cut and paste into a text file and you should be able to compute in R. To determine regression type see attachment 3 and 4. The Rosner biostatistics reference is available from the z-library. If you want a reference for R see attachment 5 also available from the z-library. Best wishes David Booth

Enrique González-Núñez

You could try using different machine learning approaches (like NN, DT, SVM, RF, among others), and compare the results, of course you'll need to separate your data in train and test sets for comparison; in this way instead of having decided "the best model" a priori, you'll have chosen based on data

Dorchelle Atonzong Guedia

We use the AIC to compare two models which are nested. I am agree with @Enrique that you can use the machine learning approches to select the appropriate model. But if you want to use AIC, you must have nested models.

KDE home range - estimation method for h smoothing factor?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Geotechnical Engineering (Proceedings of the ICE) time review?

What are the key methods and indicators used in assessing the biodiversity of river ecosystems, and how do these methods account for variations ?

"A Markov-like Model for Patient Progression"?

Why do exism movements become permanent dictatorship threats within liberal democracy thinking under majority rule-independent rule of law system?

How to report results of Generalised Linear Mixed Models in a journal article?

How to develop an academic literacy program for engineering at the higher education level?

How movement of energy in an ecosystem different from chemical cycling & movement of matter through an ecosystem differ from movement of energy?

Why cycling of matter important to life on Earth & movement of matter through an ecosystem different than movement of energy through an ecosystem?

How is matter moved through an ecosystem and movement of energy different from the movement of nutrients through ecosystems?