Regression model evaluation?

The situation you described can occur when evaluating a predictive model on different test sets. It is possible for the model to have a higher R2 value on a smaller test set but a lower R2 value on a larger test set, even if the lower R2 value is associated with lower Root Mean Square Error (RMSE) and Mean Absolute Error (MAE).

R2 (coefficient of determination) represents the proportion of the variance in the dependent variable that is explained by the model. A higher R2 value indicates that the model captures a larger portion of the variability in the dependent variable.

In your case, the model might have performed well on the smaller test set (9 observations), resulting in a high R2 value of 0.94. This means that the model explains a significant amount of the variance in the dependent variable in that particular test set.

However, when the model was evaluated on the larger test set (95 observations), the R2 value dropped to 0.73. This indicates that the model's explanatory power decreased on the larger test set, and it explains a slightly smaller proportion of the variance in the dependent variable.

Although the R2 value decreased, the lower R2 value could still be associated with lower RMSE and MAE values. RMSE and MAE are measures of the average prediction errors of the model. A lower RMSE and MAE suggest that, on average, the model's predictions are closer to the actual values in the larger test set, even though it explains a slightly lower proportion of the variance.

This situation could occur due to the inherent differences between the two test sets. The smaller test set might have had observations that were more predictable or representative of the model's training data, resulting in higher explanatory power (higher R2). The larger test set, being more diverse or containing different patterns, could have led to a decrease in the model's ability to explain the variance fully.

It's important to consider the limitations of using R2 alone as an evaluation metric. R2 does not capture the full picture of model performance, and other metrics such as RMSE and MAE provide additional insights into the accuracy of the model's predictions. Therefore, it's essential to examine multiple evaluation metrics and consider the specific characteristics of the test sets to interpret and explain the observed differences in performance.

Christian Geiser

With so few observations (n = 9), there will be a lot of sampling error plus the potential for outliers/extreme observations to have a large influence on R squared. With a larger sample (e.g., n = 95), there is less sampling error and less potential for extreme cases to influence the results. I would trust the estimate of .94 (for n = 9) less than the estimate of .73 (which is based on a more substantial sample size).

Dipankar Chowdhury

The model is based on 520 training dataset (R2=0.85) and 173 cross-validation dataset (R2=0.86). However, when it is applied to these two test datasets, one with nine observations shows 0.94 R2 while the other dataset with 95 observations shows 0.73 R2 with lower values of RMSE and MAE than the first test dataset and also than the training and cross-validation datasets. How to explain this situation observed on the test datasets?

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

What is the difference between mathematical R^4 space and physical 4D unit space?

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

Controlling for pupil light reflex when analyzing pupil size time course?

What are a “Farmers Producer Organization” (FPO) and its essential features?

Strugglling with m6A dot blot any suugesstion ?

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How to get moment output in Abaqus Standart?

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

How can I prepare virus for a TEM or SEM imaging?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?