Can R Square of .15 of a full model in SEM be acceptable?

Paul Louangrath Popular answer

ISSUE

The issue of this question is about statistical analysis concerning the size of R-squared. R-squared, also called coefficient of determination, is the measure of fitness of the proposed model to the observed data in the context of regression analysis. The uses of r-squared are either: (i) forecasting, or (ii) hypothesis testing. R-squared if the measurement of “goodness of fit.”

R-SQUARED

In a sample space called omega, we have a set of observation called Y events. The estimate of Y is the mean of Y. We are looking for a better way to predict Y; thus, an prediction function is introduced. Call that predictor function Y^ or Y-hat. Y-hat is given by:

(1) Y^ = b + b1X

… where b = intercept; b1 = slope and X is the explanatory variable (independent variable). The objective is to test whether Y^ is a better estimate than Y-bar where Y-bar is simply the mean of the set Y: (y1, y2, …,yn) and where Y-bar is given by:

(2) Y* = 1/n(sum Yi)

… where Y* = Y-bar or mean of Y; n = number of observations; and Yi is the set of observations Y: (y1, y2, …,yn).

R-squared is the test to determine how good is the predictor function through the use of ‘goodness of fit” analysis. R-square is given by:

(3) R^2 = (SSyy – SSE) / SSyy

The equation is reduced to:

(4) R^2 = 1 – (SSE/SSyy)

The terms are defined as:

(5) SSE = Sum (y – y^)^2

… where y = individual observations of y, and y^ = the predicted y from equation Y^ = b + b1X predicting the data set Y: (y1, y2, …,yn). SSE measurss the deviation of the observation from the predicted value.

(6) SSyy = Sum (y – y*)^2

… where y = individual observations of y and y* = the mean of y in the data set Y: (y1, y2, …,yn). SSyy measures the variability of y around the predicted value.

DOUBLE CHECK R-SQUARED

Whether R^2 = 0.15 is adequate? The answer to this question lies in the significant test of the correlation coefficient r.

(7) R^2 = r^2

… where r = b1(Sx / Sy); see predictor equation: Y^ = b + b1X. The test statistic for R is given by:

(8) t(r) = r(sqrt (n – 2)) / sqrt (1 – r^2)

… where the standard t(infinity degrees of freedom, 0.95) = 1.64.

With known sample size and observation series Y: (y1, y2, ..., yn) and series X: (x1, x2, …, xn), the standard deviation of X or Sx and standard deviation of Y or Sy may be determined. If the t(observation) < 1.64, the R^2 of 0.15 may be rejected as insignificant.

In this case, R^2 = 0.15. The question is whether it is significant? The table below illustrates the t-test for significance using 0.95 level of confidence: t(0.95) = 1.64

………………………………………………………….

N R^2 r t(obs) t(0.95) Conclude

………………………………………………………….

30 0.15 0.39 2.22 1.64 Significant

100 0.15 0.39 4.16 1.64 Significant

500 0.15 0.39 9.37 1.64 Significant

1000 0.15 0.39 13.27 1.64 Significant

………………………………………………………….

Note that is necessary to go from r^2 to r in order to test the level of significant. R^2 alone will not be able to answer the question whether R^2 of 0.15 is adequate to conclude whether the model is good enough. In the above tabulation where n = 30, 100, 500, 1000, the conclusion is that the model producing R^2 = 0.15 is statistically significant. Recall that the interval or range for r is between -1 and 1.

The interpretation may be counter-intuitive. We are looking for best fit. Best means that the data falls within the 0.95 significant level. If the result of the model falls outside of the 0.95 confidence interval, it means that the model produces “significant result” than what is considered best fit. In the present case, all four scenarios n = 30, 100, 500, 1000 shows that the predicted function produces the result that lies outside of the normal range. Review your original null hypothesis and alternative hypothesis statement. (1) What was your decision rule for t(obs) < t(0.95) and t(obs) > t(0.95) ? and (2) What was your intended use of R-squared: forecasting or hypothesis testing?

ATTACHED:

Excel file for the tabulation above is attached.

REFERENCES:

Draper, N. R.; Smith, H. (1998). Applied Regression Analysis. Wiley-Interscience. ISBN 0-471-17082-8.

Everitt, B. S. (2002). Cambridge Dictionary of Statistics (2nd ed.). CUP. ISBN 0-521-81099-X.

Nagelkerke, Nico J. D. (1992). Maximum Likelihood Estimation of Functional Relationships, Pays-Bas. Lecture Notes in Statistics 69. ISBN 0-387-97721-X.

Glantz, S. A.; Slinker, B. K. (1990). Primer of Applied Regression and Analysis of Variance. McGraw-Hill. ISBN 0-07-023407-8.

Paul Louangrath

ISSUE

R-SQUARED

(1) Y^ = b + b1X

(2) Y* = 1/n(sum Yi)

… where Y* = Y-bar or mean of Y; n = number of observations; and Yi is the set of observations Y: (y1, y2, …,yn).

R-squared is the test to determine how good is the predictor function through the use of ‘goodness of fit” analysis. R-square is given by:

(3) R^2 = (SSyy – SSE) / SSyy

The equation is reduced to:

(4) R^2 = 1 – (SSE/SSyy)

The terms are defined as:

(5) SSE = Sum (y – y^)^2

(6) SSyy = Sum (y – y*)^2

… where y = individual observations of y and y* = the mean of y in the data set Y: (y1, y2, …,yn). SSyy measures the variability of y around the predicted value.

DOUBLE CHECK R-SQUARED

Whether R^2 = 0.15 is adequate? The answer to this question lies in the significant test of the correlation coefficient r.

(7) R^2 = r^2

… where r = b1(Sx / Sy); see predictor equation: Y^ = b + b1X. The test statistic for R is given by:

(8) t(r) = r(sqrt (n – 2)) / sqrt (1 – r^2)

… where the standard t(infinity degrees of freedom, 0.95) = 1.64.

In this case, R^2 = 0.15. The question is whether it is significant? The table below illustrates the t-test for significance using 0.95 level of confidence: t(0.95) = 1.64

………………………………………………………….

N R^2 r t(obs) t(0.95) Conclude

………………………………………………………….

30 0.15 0.39 2.22 1.64 Significant

100 0.15 0.39 4.16 1.64 Significant

500 0.15 0.39 9.37 1.64 Significant

1000 0.15 0.39 13.27 1.64 Significant

………………………………………………………….

ATTACHED:

Excel file for the tabulation above is attached.

REFERENCES:

Draper, N. R.; Smith, H. (1998). Applied Regression Analysis. Wiley-Interscience. ISBN 0-471-17082-8.

Everitt, B. S. (2002). Cambridge Dictionary of Statistics (2nd ed.). CUP. ISBN 0-521-81099-X.

Nagelkerke, Nico J. D. (1992). Maximum Likelihood Estimation of Functional Relationships, Pays-Bas. Lecture Notes in Statistics 69. ISBN 0-387-97721-X.

Glantz, S. A.; Slinker, B. K. (1990). Primer of Applied Regression and Analysis of Variance. McGraw-Hill. ISBN 0-07-023407-8.

How can I analyse moderating variables in SEM?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

1. If I can quantize the atom using this hyperbolic spiral and classical physics, could nature do the same?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?