How to prefrom mixed model logistic regression like with random forest?

More Hanan Sela's questions See All

What is the economic importance of cap rocks?

short answer

13 June 2024 7,368 1 View

How can the joints and fractures act as a passageway for groundwater and hydrocarbon?

Short answer

22 April 2024 665 0 View

May you please send me the license key for SmartPLS 4 software?

I will be very grateful for this kind act. My email: [email protected]

20 April 2024 957 4 View

How do mutations in microbial genomes lead to adaptations to different environments?

mutations play a critical role in driving the adaptation of microbial populations to diverse environments, allowing them to thrive in a wide range of ecological niches.

04 April 2024 8,855 3 View

What does mica fish structure?

Short answer

29 March 2024 9,596 0 View

Higher order construct via SmartPLs?

the model i use contains independent variable (higher order construct) has three dimension (lower order construct), mediator and dependent variable. Applying reflective reflective higher order...

27 March 2024 3,984 2 View

Does the gravity effect on oil productions?

short answer.

29 February 2024 390 3 View

May i know the justification for using Cross Sectional Data Collection Method in mediation analysis?

As i received a comment from a journal saying: Could you please add justification on why a cross-sectional approach is appropriate here?

27 February 2024 9,459 5 View

Why the Neotectonic is playing very important role in oil accumulations in Iraq?

Short answer.

12 February 2024 344 1 View

How the active tectonic affects the characteristics of river channels?

Short answer.

12 February 2024 7,047 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

How does energy flow through Earth's systems and how does that affect climate and biosphere affect the flow of matter and energy on Earth?

02 August 2024 2,699 2 View

Adam Kania

3. Random Forest algorithm will handle correlated variables without issues.

You typically obtain 'variable importance' estimates showing you the relative importance of features (e.g. which variable is the best predictor). But due care is needed when interpreting that 'variable importance' values, as they are not designed to automatically answer "which is the best predictor", though they are often ranked/sorted in the correct order, from most important to the least important.

Interpreting the variable importance - especially in case when multiple heavily-correlated variables exist - is not trivial, but there are already numerous studies on the subject.

Thanh-Tung Nguyen

You can use RF in the caret package to perform regression problem and variable importance.

Some below R code which may help you to produce the results:

library(caret)

TrainY=trainingData[[1]] #the 1st column is the response continuous variable

TrainX=trainingData[,-1]

TestY=testingData[[1]]

TestX=testingData[,-1]

indx

Hanan Sela

Thank you, Thanh-Tung Nguyen for the detailed script. I wonder if random forest in caret or other R package can understand (1|id) "random variable" in the right hand side of the formula?

You mean: y~x1+x2+..+xn?

If yes, it is easy to understand when you use randomForest R package.

Baptiste Gregorutti

Hi Hanan,

As Adam said before, the interpretation of the variable importance measures (especially Breiman's 'permutation importance measure') is not so simple. Some studies show that the importance decreases when the number of correlated variables and the level of correlation increase. So the first variable retained by the ranking computed by the random forests is not necessarily the most relevant if it belongs to a group of correlated variables. Actually I wrote a paper about it with some theoretical results (see attached files).

Article Classification with correlated features: Unreliability of fe...

Article Correlation and variable importance in random forests

Yolande Tra

I think you better off with the mixed modeling approach. As far as I know, the predictors used in Random Forest are fixed not random.

You might also want to read about all-relevant feature selection algorithms that are designed with exactly the purpose of determining all features relevant to a classification. Due to the complexities involved they use quite advanced, heuristic approaches to do the work.

I have used the Boruta R package for that; you can read the associated publications by Kursa & Rudnicki, maybe also some general info in addition to papers attached by Baptiste above (e.g. by Tuv et al).

Article The All Relevant Feature Selection using Random Forest

Article Feature Selection with Ensembles, Artificial Variables, and ...

Thank you all for your suggestions . I have tried both the Boruta package, the caret

package and, party package and they all have selected more or less the same important features. However, varImp{party} throws the most important feature selected by Boruta and caret to the end of the list. So, as Adam said, "it is no so simple" .

Hanan

Steven L. Van Wilgenburg

Just looking in to similar issues myself and came across the following which may (or may not) prove useful...

Sela, R. J., & Simonoff, J. S. (2012). RE-EM trees: a data mining approach for longitudinal and clustered data. Machine learning, 86(2), 169-207. (coincidence re: name)

Eo, S. H., & Cho, H. (2014). Tree-structured mixed-effects regression modeling for longitudinal data. Journal of Computational and Graphical Statistics, 23(3), 740-760.

Bürgin, R., & Ritschard, G. (2015). Tree-based varying coefficient regression for longitudinal ordinal responses. Computational Statistics & Data Analysis, 86, 65-80.

Another tree based approach...

Fokkema, M., Smits, N., Zeileis, A., Hothorn, T., & Kelderman, H. (2015). Detecting treatment-subgroup interactions in clustered data with generalized linear mixed-effects model trees (No. 2015-10).

install.packages("glmertree", repos="http://R-Forge.R-project.org")

-Steve