How to test the assumptions of linear regression?

More Jialing Zhang's questions See All

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Are there any good simple systems or platforms to recommend?

In order to show people the beauty of control and enhance enthusiasm for learning control theories, are there any good simple systems or platforms to recommend?

05 August 2024 10,034 1 View

Where to find a gene list for CRISPRa/i library screening of regulatory factors that affect pathogenic Th17 differentiation in PBMC?

I want to perform a CRISPRa/i library screening for candidate regulatory factors that affect pathogenic Th17 differentiation in PBMC. But I am wondering how to define the size of gene library, so...

31 July 2024 2,150 0 View

Following click reaction in cell lysates, protein is immobile and remains at the top of the gel in SDS-PAGE?

I am using CuBr/THPTA for a click reaction in total cell lysates. I am facing issues with my protein sample in non-reducing SDS-PAGE where it's not migrating properly and most of it remains at the...

29 July 2024 950 4 View

Which filtration method to go for run off water from dirty solar panels to be used again?

We are working on a robot that cleans solar panels using fresh water supply and a rotating brush. We are trying to conserve as much water at possible by recycling the dirty water that is collected...

28 July 2024 5,778 2 View

When making abraxane with Nab tech, why is 9:1 Chloroform&EtOH used as solvents?

Is it the best optimization ratio based on experiments?

22 July 2024 8,170 0 View

Can diamond be grown using molecular beam epitaxy?

22 July 2024 9,755 2 View

How can I use LabVIEW to control of an ethernet-based driver?

Hello everyone, I am new to LabVIEW, right now we need to control the movement of an ethernet-based stage drive. We got the sample code from a collaborator but their instrument is USB-based, so we...

11 July 2024 5,385 2 View

Systematic review meta-analysis paper?

Hi everybody, We are trying to write a systematic review meta-analysis paper. But I could find 19 references. I think 19 references are not enough to do a meta-analysis section and it is better to...

10 July 2024 5,490 5 View

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?

I am trying to analyse data from a survey examining what variables affect teachers perceived barriers to incorporating technology into their classroom. I have 5 predictor variables however my DV...

06 August 2024 1,752 3 View

In order to run Multinomial Logistic Regression, is it required that the data be in the long format?

I am using unit level data (IHDS round 2) & Stata 17

06 August 2024 5,725 2 View

How to report results of Generalised Linear Mixed Models in a journal article?

Hi everyone, If you have written or come across any papers where Generalised Linear Mixed Models are used to examine intervention (e.g., in mental health) efficacy, could you please share the...

04 August 2024 4,130 4 View

Request a single Lecture notes for math as detailed as this that I can find in one place?

- The Existence/Uniqueness of Solutions to Higher Order Linear Differential Equations - Higher Order Homogenous Differential Equations - Wronskian Determinants of $n$ Functions - Wronskian...

03 August 2024 2,366 0 View

Normality assumption for linear regression is The assumption of normality is whether for residual errors or predictor variavble?

When we conduct linear regression, there are several assumptions. The assumption of normality is whether the residual errors are normally distributed, not whether a predictor is normal?

31 July 2024 6,164 3 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

SAS Generalized Linear Model for trial/event anaysis and not survival (time to event) analysis?

I am looking for a published article using SAS or SPSS Generalized linear model for trial/event data and not survival analysis. Both software packages off the option for the number of success out...

30 July 2024 3,835 2 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

If in a panel data, T>N then which model is appropriate ?

In my data set, T is greater than N, so I chose quantile regression for my data set. So is it appropriate for that?

15 July 2024 6,416 4 View

What are the problems we face when we directly inverse a multivariate regression equation?

Why direct inversion of mutivariate regression equation is not preferred and instead optimization techniques are used?

15 July 2024 8,642 3 View

Jochen Wilhelm

You can use hypothesis tests for some of the assumptions. You can test normality with Shapiro-Wilk on the residuals, linearity with the runs test on the residuals, you can calculate influence values and give an alarm when the influence of a point is larger than tolerable. This van be automated, and you can get a list of models with possible problems. This reduced list should then be checked by eye and an informed decision can be made if in these cases the violation of the assumtions is tolerable or not.

Note that formal hypothesis tests on assumptions do NOT(!) tell you if there is a relevant violation of the assumptions. And they don't tell you at all if relevant violations are absent. They are actually very unusable and meaningless. However, they can be used in a pre-screening, do give you a smaller subset of candidates that might be worth being checked by an expert (that is, by you).

But is is in fact striking that you have so many predictors. This seems to be a very badly defined reserch question, there does not seem to be any half-way solid theory behind, and if this is correct, the expectations to find something reliable is extremely low. I also wonder if these predictors should really be analysed seperately. Why not using them together? What if there can be interactions between some of the predictors? What if some of the predictors are correlated, what if some of the predictors are not independent? And how do you address the multiple testing problem? Really, all alarm bells are ringing...

Christian Geiser

Most assumptions concern either the DV (e.g., its scale level) or the residual (error) variables (e.g., their normality and homoscedasticity). Therefore, it should not matter so much how many predictor variables you have in your model for testing (most) assumptions. The only assumption that I know of that concerns the predictor variables is that they are measured without error (i.e., with perfect reliability). This assumption would have to be checked for each individual predictor, but it is unrealistic to begin with.

Ronán Michael Conroy

A test gives you a single number. Not a great way to check assumptions!

You are better off using graphics for several reasons. The most important of these is that your eye is very good at identifying patterns including ones you didn't foresee. The second reason is that you can use plots to examine many aspects of model fit across the range of the predicted and predictor variables.

Excellent introduction to the area by that star of the Stata community, Nick Cox:

Article Speaking Stata: Graphing Model Diagnostics

David Eugene Booth

@Christian don't forget that IVs measured with error can be dealt with with errors in variables models so that is not such a bad thing after all. Best wishes David Booth

Ronán Michael Conroy , I just want to make sure that you don't think that I'd advocate hypotehesis tests of assumptions. I just said that such tests may possibly be used as a sieve to boil down the number of plots to investigate (and variables and relationsships to thoroughly think about)*. I hope I hade clear the (serious) limitations of this approach. But thank you for underlining that tests of assumptions are generally not useful.

---

* and that's the point: doing analyses without thinking thoroughly about is a recipe to doing things wrong.

Jochen Wilhelm – In this area, I have to confess that I have what we call in Irish ciall ceannaithe – dearly-bought wisdom. I have a paper that had to be retracted because I didn't look at a plot that would have revealed the problem with the data right away.

And no, I certainly didn't think you were advocating hypothesis tests over actually looking at the data. I know what happens when you do!!

Carlos Jimenez-Gallardo

Questions,

1. Do you have a ton of simple linear regressions?

2. Are you working in Data Mining?

3. Is there any theory that supports any possible result?