For example, for the classical ratio estimator, the variance of the slope, b, is sigma squared for the random factors of the estimated residuals divided by the sum of the x-values associated with the y-values in the sample. Thus the bigger the sample, the smaller the variance of b. But when we do a confidence interval around b, with a large enough sample, can we use a t-distribution, even when the random factors of the estimated residuals are not close to being normally distributed? I think so. Am I wrong?

This is of relevance to whether the Central Limit Theorem can apply, in part, to a prediction interval around a predicted y. The estimated variance of the prediction error contains a term from the estimated residuals, which I have seen Scott Fortman-Roe call the "irreducible error." (Its the sigma for the full residuals.) The other terms in the estimated variance of the prediction error are from the model. They are reduced by an increase in sample size.

I had thought that for a predicted total, a prediction interval around that might make use of normality due to the Central Limit Theorem, except for that one term, but now I'm thinking that even for a prediction for a single member of a population, because we are looking at conditional distributions (for a ratio estimator, say the distribution of y given x), then the Central Limit Theorem should have influence on the model term. And in general, all model terms would be subject to the Central Limit Theorem. (Originally, I had vaguely thought that for the estimated residuals we could make use of the Central Limit Theorem too, for some kind of mean, but for a given prediction, the irreducible error is, well, irreducible. That seems to make prediction intervals problematic, as opposed to confidence intervals.)

So, preliminary to knowing if the Central Limit Theorem partially applies to a prediction interval - except for the 'irreducible error,' that is - we need to know "Does the Central Limit Theorem apply to an estimated regression coefficient when the estimated residuals are not normally distributed?"

..........

As an example then, indicated above (for a ratio estimation), if we have a simple linear regression (i.e., one predictor), with a zero intercept, y = bx + e, then "...when we do a confidence interval around b, with a large enough sample, can we use a t-distribution..." by taking advantage of the Central Limit Theorem, regardless of any distribution associated with e?

[I pieced together some images for the one regressor, zero intercept case.]

For any "yes," or "no" answers, please explain why.

Thank you.

More James R Knaub's questions See All
Similar questions and discussions