01 January 1970 11 320 Report

In the latest approaches to trying to infer from nonprobability samples, multiple covariates are encouraged.  For example, see https://www.researchgate.net/publication/316867475_Inference_for_Nonprobability_Samples.  However, in my experience, when a simple ratio model can be used with the only predictor being the same data item in a previous census (and results can be checked and monitored with repeated sample and census surveys), results can be very good.  When more complex models are needed, I question how often this can be done suitably reliably.  With regard to that, I made comments to the above paper.  (That paper is available through Project Euclid, using the DOI found at the link above.) 

Analogously, for heteroscedasticity in regression, for Yi associated with larger predicted-yi, sigma should be larger.  However, when a more complex model is needed, this is less likely to be empirically apparent.  For a one-predictor ratio model where the predictor is the same data item in a previous census, and you have repeated sample and census surveys for monitoring, this, I believe, is much more likely to be successful, and heteroscedasticity is more likely to be evident. 

This is with regard to finite population survey statistics.  However, in general, when multiple regression is necessary, this always involves complications such as collinearity and others.  Of course this has been developed for many years with much success, but the more variables required to obtain a good predicted-y "formula," the less "perfect" I would expect the modeling to be.  (This is aside from the bias variance tradeoff which means an unneeded predictor tends to increase variance.) 

[By the way, back in Cochran, W.G.(1953), Sampling Techniques, 1st ed, John Wiley & Sons, pages 205-206, he notes that a very good size measure for a data item is the same data item in a previous census.] 

People who have had a lot of experience successfully using regression with a large number of predictors may find it strange to have this discussion, but I think it is worth mulling over. 

So, "When more predictors are needed, how often can you model well?"

More James R Knaub's questions See All
Similar questions and discussions