Are Random Forests affected by multi-collinearity between features?

More Lasini Liyanage's questions See All

How to add different colors in the same curve in Topspin?

I am trying to change the colors of the same spectra for different areas in the same curve using top spin 4.0.7 for 2D NMR..I changed the curve color but it is applied to the whole curve.. But I...

02 January 2024 5,698 0 View

Why am I getting smears instead of colonies during tesing the CFU using pour plate method?

i have tried to check the CFU of 6 species of bacteria using pour plate method, and all the bacteria had some smeared colonies within the agai medium.

26 October 2023 6,726 1 View

Tips for using ABTS peroxidase in ELISA?

If anyone has used ABTS peroxidase in ELISA as the substrate, how long do you incubate before you stop/read the plates? I am going to develop an ELISA and plan to use ABTS peroxidase and aim to...

31 May 2023 1,465 4 View

Can Environment pollution methods can take as a significance calculation index in poultry industry impact aspect registry development?

What legal facts develop a closure to develop the significance of environmental impact aspect registration in the poultry industry? How to use the environment pollution method as a rating for...

12 March 2023 8,682 0 View

Surface Chemistry, Issue when calculating the K value ( can someone confirm that my equations are good and i am using the right values?)?

I need to use the following data and equation to get the K value and plot the Langmuir isotherm, θ vs P, for a system where nitrogen adsorbes/desorbes molecularly from a surface. Data: ΔHads= 30...

11 November 2022 8,303 1 View

Physcomitrium (Psycomitrella) long term preservation?

Hello dear reasearchgate fellows, I am interested in culturing Physcomitrium patens as a heterologous system. May i know some best ways for long term preservation of the species please ?

21 June 2022 1,010 0 View

Compact callus to friable callus?

Hi Everybody, I am working on a tissue culture project on bulbous monocot plants, which twin scales are my ex plants for the callus induction. All the callus i got from different treatments are...

31 January 2022 4,042 7 View

Please recommend a good journal to publish a research article on Technology Orientation?

Hello, I am looking for a good journal to publish an article on Technology Orientation. Can you please suggest some goods journals. Thanks in advance Oshan

03 January 2022 9,428 2 View

Thoughts on using protein L in indirect ELISA to detect total antibody response?

I am planning to develop a multispecies (for marsupials) indirect ELISA and looking for a suitable antibody binding reagent. To my knowledge Protein A/G and protein L seem to be good candidates,...

01 September 2021 5,415 3 View

Is there any index that measure Social Media Usage?

Do you know of a reputed index that evaluates the social media usage by country. I am trying to compare different countries for a study. But I need a credible source to measure the usage of Social...

19 April 2021 7,966 5 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How combine yolo with Faster R-CNN?

I want a model that is balanced with accuracy or speed, faster rcnn has high accuracy while yolo have fast speed. i am thinking to combine them to get a hybrid model to achieve both speed and accuracy

02 August 2024 3,104 0 View

Is a reliability test necessary in my survey on translations?

Dear all, I gave 116 respondents 18 translated sentences and asked them to indicate their levels of acceptance of these translations on a five-point scale. Some translations result from strategies...

24 July 2024 8,245 5 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

How do we pick data for determination of Validation Acceptance Criteria?

Hello, colleagues! There is commenting open for new upcoming edition of USP 1033. Validation target acceptance criteria is now different from what it used to be and it doesn't include Cpm....

23 July 2024 7,292 3 View

What is trustworthiness in qualitative research and how can you improve reliability accuracy and validity?

12 July 2024 9,035 6 View

How is artificial intelligence being utilized to enhance the diagnosis and treatment of sleep apnea?

AI has the potential to improve the management of sleep apnea by personalizing treatment, enhancing diagnostic accuracy, and advancing our understanding of the condition.

03 July 2024 9,393 2 View

Experts on orchid viruses for an article for home gardeners?

I am writing an article for my blog regarding orchid viruses aimed at home gardeners. It will include details of home testing kits their use and accuracy. I also want to include details about...

26 June 2024 7,422 1 View

How can the integration of GIS spatial analysis techniques with economic modeling improve the accuracy and effectiveness of environmental policy ?

Environment and GIs

25 June 2024 1,875 3 View

What can be studied next in the field of 6G mobile communication network positioning?

In the 3GPP R16 standard and subsequent versions, the positioning capabilities of 5G have gradually improved, and location-based services have received further attention. 5G has designed a...

20 June 2024 2,047 2 View

Ananthasubramani Rajagopal

In such cases , partial least square method be adopted . In case of relationship between many variables, principle component will filter the predictors . Otherwise , grouping by clustering to be done and then prediction methods .

Roberto Vega

The short answer is no. It does not affect prediction accuracy.

Multicollinearity does not affect the accuracy of predictive models, including regression models. Take the attached image as an example. The features in the x and y axis are clearly correlated; however, you need both of them to create an accurate classifier. If you discard one of them for being highly correlated with the other one, the performance of your model will decrease.

If you want to remove the collinearity, you can always use PCA to project the data into a new space where the 'new features' will be orthogonal to each other. You can then, train your model with the new features, but you will find that the performance is the same. You simply rotated your original decision boundary.

Now, where multicollinearity becomes 'an issue' is when you want to 'interpret' the parameters learned by your model. In other words, you cannot say that the feature with the 'biggest weight' is 'the most important' when the features are correlated. Note that this is independent on the accuracy of the model, this is only the interpretation part, which in my opinion, you should not be doing anyway. To see why you can read: https://robertoivega.com/association-prediction-studies/#more-188

Benedikt Hufnagl

Checkout these two papers:

Toloşi, Laura, and Thomas Lengauer. "Classification with correlated features: unreliability of feature ranking and solutions." Bioinformatics 27.14 (2011): 1986-1994.

Strobl, Carolin, et al. "Conditional variable importance for random forests." BMC bioinformatics 9.1 (2008): 307.

It certainly has an effect on the interpretability of the variable importance measure

Hein Van Gils

Multicollinearity is the rule rather than the exception as we deal with ecosystems. For example, the often used WorldClim and meteorological data generally show multicollinearity. In mountains, meteorological data show multicollinearity, i.e. correlation with elevation. The latter shows correlation with vegetation cover categories, and so on.

There are several issues to consider, some have been addressed above. Not all correlated geodata have the same spatial resolution and accuracy. WorldClim or meteo data are not primary data, but crude interpolations with a coarse spatial resolution (5 km) in comparison with a DEM (e.g. 90 m) or NDVI as cover proxy. Further, multicollinearity may have an impact on model transferability.

Please be welcome to have look at our pertinent articles. Keywords: Majella bear, Majella krummholz, transferability Australia/Spain.

Lasini Liyanage

Thank you all for your answers.

Peng Zhang

Although the predictive power or reliability of machine learning algorithms is generally not affected by the multicollinearity of variables, the importance of variables with high collinearity will be offset by each other, thereby affecting the overall interpretability of the predictor variables.

Therefore, if you only focus on the prediction or classification performance of the random forest classifier, the multicollinearity between variables can be ignored; if the relative importance of these variables needs to be calculated and explained, the multicollinearity between the variables needs to be eliminated as much as possible.

Shuraik Kader

In my opinion, the random forest uses bootstrap sampling and feature sampling such as row sampling and column sampling. Therefore Random Forest is not affected by multicollinearity that much since it is picking different set of features for different models and of course every model sees a different set of data points