Could you please suggest a data set about a simple linear regression with heteroscedasticity?

Farzane -

There is a small data set given at the end of https://www.researchgate.net/publication/261947825_Projected_Variance_for_the_Model-based_Classical_Ratio_Estimator_Estimating_Sample_Size_Requirements, to start for you.

Most papers available from my RG pages (see contributions starting at https://www.researchgate.net/profile/James_Knaub) show methodology I developed for handling heteroscedastic data from establishment surveys, used extensively at the US Energy Information Administration (EIA). A great deal of such data may be obtained from http://www.eia.gov, and/or by writing a request to [email protected].

Many papers I loaded on ResearchGate were developed and used at the EIA, for estimating missing (nonresponse and out-of-sample) data with regard to electric power, natural gas, and other energy establishment surveys.

This is with regard to finite population sample surveys - often monthly sample surveys - with regressor data from less frequently gathered, often annual, census surveys of energy establishments, which may be used with any such highly skewed finite populations.

Data requests may be made to [email protected], but you might first want to look at the EIA website and explore the data collection survey forms and aggregate data reports available. There are thousands of aggregate level values reported each month, and a great deal more microdata from surveys are collected and used to obtain this.

Cheers - Jim

Conference Paper Projected Variance for the Model-based Classical Ratio Estim...

James R Knaub

Farzane -

Note that the paper attached to my previous post, and a number of others at https://www.researchgate.net/profile/James_Knaub/contributions use the level of heteroscedasticity which Brewer, KRW (2002), Combined survey sampling inference: Weighing Basu's elephants, Arnold: London and Oxford University Press would associate with a cluster from Cochran, W.G(1977), Sampling Techniques, 3rd ed., John Wiley & Sons with independence between the elements in the cluster. It (the classical ratio estimator, CRE) appears often robust against data quality issues for prediction of y when x is small.

Note also that although heteroscedasticity also occurs in time series regression, the work you see on my ResearchGate pages will be for predictions involving finite populations, not time series.

If you are looking to estimate the level of heteroscedasticity in a given data set, rather than default to the CRE, there are multiple methods. The Iterated Reweighted Least Squares Method is a common one, and is explained well in Carroll and Ruppert(1988), Transformation and Weighting in Regression, Chapman & Hall, Ltd. London, UK. Here are some other ideas:

https://www.researchgate.net/publication/263809034_Alternative_to_the_Iterated_Reweighted_Least_Squares_Method_-_Apparent_Heteroscedasticity_and_Linear_Regression_Model_Sampling

and

https://www.researchgate.net/publication/263032446_Weighting_in_Regression_for_Use_in_Survey_Methodology

Here is an explanation for this heteroscedasticity:

https://www.researchgate.net/publication/262972023_HETEROSCEDASTICITY_AND_HOMOSCEDASTICITY

And here is a paper showing usefulness of weighted least squares regression, and as with my other papers, not just for predicting/estimating individual cases, but also for predicting/estimating totals for categories or groups or whole populations in finite population statistics:

https://www.researchgate.net/publication/263036348_Properties_of_Weighted_Least_Squares_Regression_for_Cutoff_Sampling_in_Establishment_Surveys

Cheers - Jim

PS - For multiple linear regression, or even more general multiple regression, one can find regression weights involving a coefficient of heteroscedasticity by using a preliminary prediction-of-y as the size measure in place of x.

PSS - As noted, a great deal of data using this are available from http://www.eia.gov, and by contacting the US EIA using the email address supplied previously.

Article HETEROSCEDASTICITY AND HOMOSCEDASTICITY

Article Weighting in Regression for Use in Survey Methodology

Article Properties of Weighted Least Squares Regression for Cutoff S...

Conference Paper Alternative to the Iterated Reweighted Least Squares Method ...

How can you get the percentage of alcohol in an unknown solution without using an alcohol meter?

I have a solution of a certain volume, how can I get the alcohol percentage of the solution? (without using an alcohol meter)?

How can i calculate external radiation risk?

Which transfection reagent is the best for plasmid transfection of MDA-MB-231 cell line, Lipofectamine based or other reagent suggestion?

How to calculate the change of a protein channel diameter in during simulation time?

Could you suggest how can I count the floating point operations(flops) in the MATLAB codes?

Would you please tell me, concentration of compounds for international injection compared oral or central injection?

Calculating Potential of mean force (PMF) using SMD trajectories!?

How to interpret the surface roughness trend in PES/GO membrane?

Compatible forcefield with charmm to simulate a protein containing Trimethyllysin!?

How can I prepare virus for a TEM or SEM imaging?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?