Hi all. I'm working on the optimization process using central composite design (Design Experts software). The results showed that the model is significant but at the same time the predicted R-squared has a negative value. How can i explain that?
I have had similar observations even though not in response surface methodology contexts but in autoregressive moving average contexts. The model coefficients were statistically significant but the R-squared was negative. Perhaps this was an indication of overall model inadequacy. A better model could have been sought.
I don't think R sq value can be negative if the model is adequate and all the terms are significant. If you can share specific data one can comment on this unusual observation.
Adjusted r-square typically is lower than r-square, because the adjustment follows the number of parameters in the model and degree of freedom (dof). Calculation is different, but should not result in negative values when enough dof are in the model.
Yes adjusted R sq will be certainly lower than the R sq value. The reason behind this is the removal of non significant terms from the model which may increase the error between predicted and actual value of response although marginally. This will lower R sp adjusted value.
Prediction R square is calculated using the PRESS: predR²=1-PRESS/(total sum of squares). So, if PRESS is greater than total sum of squares, predR² will be negative. PRESS (Prediction REsidual Sum of Squares) can be obtained by cross-validation (leave one out) or by matrix calculation: PRESS = sum(prediction residuals) = sum(residual/(1-hii)), where hii are the leverages which are diagonal elements of X(X'X)-1X'.
If predR² is negative, it means that you have probably effects in your model which are not statistically significant. These effects can be considered as noise because their effects are not greater than noise and noise is well known to be very bad to do predictions.
In conclusion, remove non significant effects of your model and predR² will increase. Even if R² will decrease, it's not a problem because a model is useful to perform predictions. It's why I prefer predR² to R².
Whatever the factor you'll add, the R² will increase, even if it as no sense!
It all means that your experimental model (Quadratic or polynomial model) was not accurate in predicting your experimental data. This is can be due improper experimental model, or inappropriate data. In the case of inappropriate data, it could be that your data you used or obtained from the experimental design violate the assumption of analysis of variance (Anova). Experimental data which violate the assumption of ANOVA includes data obtained from counting an object such as microbial count, a ratio or proportion of number etc. This data needs to be transformed before they can be fitted to experimental model. Failure to do so could result to a negative coefficient of determinant you obtained in your result. Another way to detect whether the data you obtained from your experimental design required transformation is if the ratio of the result of your response or dependent variables is equal or greater than 10, then the data needs transformation.
For further explanation about data that requires transformation kindly consult a book on Response surface methodology: Process and products optimization using designed experiment by Myers and Montgomery.