R-square of model (random forest) is low but high in validation result?

More TzeHuey TAM's questions See All

Why is my thin film PDMS TFM device warping only when with cells?

I've been fabricating traction force microscopy devices in glass bottom dishes, using a method that first spins a 100 micron PDMS layer, then a ~1 micron PDMS + fluorescent beads mixture layer....

09 June 2024 652 2 View

I'm looking for legislation on edible insects in Switzerland, Belgium, Canada, and Thailand?

I'm studying the legal aspects of edible insects. And I want to learn about countries' legal regulations on edible insects. Thanks

17 September 2023 5,078 1 View

Does anyone know how to get permission for Stunkard's figure rating scale?

I am currently conducting a final-year project that involves using Stunkard's figure rating scale as one of the assessment tools. However, I am unsure about the permissions required to use this...

25 July 2023 8,538 0 View

Does anyone know any instruments to measure body image satisfaction that is free of charge?

I am currently conducting research on body image satisfaction and need to find appropriate instruments for measurement. I am looking for tools or questionnaires that specifically assess body image...

25 July 2023 9,505 1 View

Which European companies would you recommend for microbiome analyses?

Hello, my research group is rather new to the field of microbiome research. We have been collecting stool samples (stabilized in OMNIGENE gut tubes and then frozen at -80°C) and are now looking...

29 June 2023 4,022 1 View

How do I solve an error when reproduce deep learning python script?

Hi everyone, I am learning deep learning for satellite image classification using this tutorial...

03 January 2023 8,253 11 View

Where can I get GBSAR training material?

Hi everyone, Recently, I start working on GBSAR project, therefore, I am looking for GBSAR training material in order to more understand about it. Please suggest me where can I get those...

06 September 2022 449 0 View

How to use t-test to justify the decision of combining two datasets?

Hello everone! I would like to combine two datasets collected from two different sources (the particpants were not the same). I used the same survey (including all the measurements) for both data...

02 August 2021 8,735 18 View

How to optimize hydrological model?

Hi, I have produced calibration (1984-1988 )and validation (1989-1993) results, but their results are inverted. I used the parameters that was derived from the calibration model and applied to...

28 May 2021 5,600 19 View

How can we specify the two-standard-deviation or multiple-standard-deviation shock in impulse response functions based on VAR model in STATA?

This is because I think that for the shocks that are too severe and disastrous, we really need to specify the magnitude of these shock which must be larger than one-standard-deviation change in...

10 October 2020 4,986 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

How I can select appropriate level to adjust clustering effect?

I have multi-stage data. The ICC is different at all level. Say at stage 1 =0.1, at stage=0.2 and at stage 3=0.3. I want to run melogit regression. Please, which variable should be in...

16 June 2024 6,265 4 View

How can I calculate the NDVI of a forested area over a period of ten years?

Hello everyone, I have a question about calculating the NDVI for a forested area over a period of ten years. I would like to know if I should calculate the average NDVI for each year, or if I...

01 June 2024 1,147 4 View

Can I use BL21(DE3) strain for library preparation ?

I am creating enzyme variant library through random mutagenesis but I am not getting any colonies after transformation using ligated product. I have E. coli BL21 (DE3) and BL21 Rosetta 2 (DE3)...

27 May 2024 5,332 0 View

Random walk vs probabilistic distribution. How to use a random walk as a probabilistic distribution.?

I am looking forward to using random walks as a probability distribution for solving a particular problem in a research field. I found some blogs and articles which states different concepts of...

26 May 2024 8,302 0 View

What are the explanations as to why apes have been largely confined to tropical & subtropical forests, at least during the Holocene?

As best I have been able to determine from reading literature, chimpanzees, gorillas, bonobos, and orangutans have been largely confined to tropical and subtropical forests, at least during the...

15 April 2024 2,568 0 View

How do i build a hybrid model for credit card fraud detection using logistic regression and random forest models?

An emblem model for fraud detection in credit cards using logistic regression and random forest algorithms.

28 March 2024 1,990 1 View

What would be a standard protocol for termite sampling in sub-tropical Himalayan foothill forests?

My study area is in eastern Himalayan foothill landscape which is a tropical to sub-tropical forest; elevation ranges from 150 to 1300 meter. The forest is some parts dominated by sal, tick and...

21 March 2024 8,297 1 View

Do we also assume in a linear mixed effects regression that the unobserved variables shouldn't be correlated with the observed(independent)variables?

In a random effects regression we have the assumption that the individual specific heterogeneity is not correlated with the predictor variables: Yit = 𝛽1Xit,1+ 𝛽2Xit,2+…+ 𝛽kXit,k+ 𝛼𝑖 + 𝑢𝑖t...

14 March 2024 7,422 3 View

Gamal Seedahmed

Dear Tam,

Try to use cross validation during the building of your model and include all data sets during the cross validation. In this sense you can evaluate the quality of the data on the obtained r-square from different subsets of your data.

Regards,

Gamal

TzeHuey TAM

Gamal Seedahmed thank you for your suggestion

Mantas Lukauskas

TzeHuey TAM are you using the random forest for a regression task in this situation? Because your validation is better than the training I would suggest going with deeper trees and look what will happen there. Also what variables do you have? Continous, discrete, categorical or mixed? As well you can try to use XGBOOOST, LightGBM or Catboost.

Mantas Lukauskas yupe, i applied random forest for regression. all my variables are continuous. deeper trees means ntree set to higher , eg 1000? I will use the XGBoost, lightGBM and Catboost

TzeHuey TAM for example in scikit-learn random forest function there is parameter "max_depth", also another one min_samples_split that can push tree to be deeper or