Are ther R packages with Mixed Effects Random Forests (MERFs) which can also handle missing values and stratify the data?

01 October 2020 2 736 Report

Clustered/random data are very common in data analysis. For example, if I want to model the occurrence (presence/absence) of a species over multiple countries I could suggest that the countries are clustered/random effects. In theory I could use a binomial GLMM. However, the structure of the dataset, performance and information resulting from this model are not satisfiable and mostly do not fit my questions. The non-linear responses, high variability of the data, (randomly) missing values of the predictors, categorical predictors, and unbalanced dataset make it more challenging. Because of this I often use Random Forest models. Although, (sometimes) it suggested RF models are black-box models this is hardly the case. The return of variable importance, display of partial dependency plots, extraction of split-points at the root node, depth and number of split of the predictor variables makes it a complex white-box model. These results are also fitting most of the questions I ask. One could suggest a to use a GAMM, but there are so much buttons to tweak on these models, I do not feel confident and comfortable using them.

To handle the missing values, categorical data, and unbalanced datasets I used the randomForest package for R (Liaw and Wiener, 2002). The randomForest package has the possibility to impute the median for missing values and stratify (downsizing) the data in unbalanced datasets, which makes well suited for the data I work with. The stratification of the data is key as well as the imputation of the median. However, a drawback is that the randomForest package cannot take in account clustered/random effects. This then ends up as a discussion points for basically each analysis.

There are some scientific publications of MERFs (i.e. A. Hajjem et al. 2014) and R-packages of available (i.e. MixRF). However, from the description of the manual of these packages it does not seem they can impute the median and stratify the data. I do not want to lose a lot of my data by balancing my datasets before analysis and I do not want to lose information by removing incomplete samples.

Is there any news on an R-package that implements RF models that can handle al these things? Or, is there a suggestion for other types of models in R which can return similar information as the RF models and are (sort to say) user friendly like the randomForest package?

Thank you in advance,

Liaw, A., Wiener, M., 2002. Classification and Regression by randomForest. R News 2, 18–22.

Ahlem Hajjem, François Bellavance & Denis Larocque (2014) Mixed-effects random forest for clustered data, Journal of Statistical Computation and Simulation, 84:6, 1313-1328, DOI: 10.1080/00949655.2012.741599

Sandhya Avasthi

You can refer to the links, might be helpful,

https://rdrr.io/cran/MixRF/man/MixRF.html

https://cran.r-project.org/web/packages/vcrpart/vcrpart.pdf

Wim Kaijser

Sandhya Avasthi As far as I know (after reading the manual of the MixRF) there is no option to impute the median for missing values or stratify the dataset. I also read online that the MixRF package (and functions therein) cannot work with categorical values. Al three of these are critical, since the data I work with contains missing values, are unbalanced and have often categorical values. Therefore, the MixRF package (till now) is not an option, since I value the latter three points more than the incorporation of clustered/random effects. The vcrpart can handle categorical predictors (too my knowledge), but cannot impute the median or stratify the dataset.

How to address the inflated degrees of freedom in "dissimilarity modeling"?

Cost effectiveness (or ICER) of ICU admission for COVID patients ??

Why is it P(|T|>=t)?

Details Japanese sanctions on Russia?

Do we estimate parameters, statistics or both: least ambiguity of wording?

Minor "statistics" and credibility intervals?

Is it possible to calculate confidence intervals for CLES via Fishers Z transformation?

Waht is the effect of browsing animals on crown formation in trees?

Is a weighted quasibinomial GLM reasonable?

Is correction for multiple comparisons needed?

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Hello researchers Is this a random laser or just fluorescence?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Measuring the Intelligence of a Species?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?