Testing the assumptions of simple linear regression (one predictor variable and one dependent variable) may be easy. But how do you do the tests efficiently when there are tons of predictor variables?
You can use hypothesis tests for some of the assumptions. You can test normality with Shapiro-Wilk on the residuals, linearity with the runs test on the residuals, you can calculate influence values and give an alarm when the influence of a point is larger than tolerable. This van be automated, and you can get a list of models with possible problems. This reduced list should then be checked by eye and an informed decision can be made if in these cases the violation of the assumtions is tolerable or not.
Note that formal hypothesis tests on assumptions do NOT(!) tell you if there is a relevant violation of the assumptions. And they don't tell you at all if relevant violations are absent. They are actually very unusable and meaningless. However, they can be used in a pre-screening, do give you a smaller subset of candidates that might be worth being checked by an expert (that is, by you).
But is is in fact striking that you have so many predictors. This seems to be a very badly defined reserch question, there does not seem to be any half-way solid theory behind, and if this is correct, the expectations to find something reliable is extremely low. I also wonder if these predictors should really be analysed seperately. Why not using them together? What if there can be interactions between some of the predictors? What if some of the predictors are correlated, what if some of the predictors are not independent? And how do you address the multiple testing problem? Really, all alarm bells are ringing...
Most assumptions concern either the DV (e.g., its scale level) or the residual (error) variables (e.g., their normality and homoscedasticity). Therefore, it should not matter so much how many predictor variables you have in your model for testing (most) assumptions. The only assumption that I know of that concerns the predictor variables is that they are measured without error (i.e., with perfect reliability). This assumption would have to be checked for each individual predictor, but it is unrealistic to begin with.
A test gives you a single number. Not a great way to check assumptions!
You are better off using graphics for several reasons. The most important of these is that your eye is very good at identifying patterns including ones you didn't foresee. The second reason is that you can use plots to examine many aspects of model fit across the range of the predicted and predictor variables.
Excellent introduction to the area by that star of the Stata community, Nick Cox:
Article Speaking Stata: Graphing Model Diagnostics
@Christian don't forget that IVs measured with error can be dealt with with errors in variables models so that is not such a bad thing after all. Best wishes David Booth
Ronán Michael Conroy , I just want to make sure that you don't think that I'd advocate hypotehesis tests of assumptions. I just said that such tests may possibly be used as a sieve to boil down the number of plots to investigate (and variables and relationsships to thoroughly think about)*. I hope I hade clear the (serious) limitations of this approach. But thank you for underlining that tests of assumptions are generally not useful.
---
* and that's the point: doing analyses without thinking thoroughly about is a recipe to doing things wrong.
Jochen Wilhelm – In this area, I have to confess that I have what we call in Irish ciall ceannaithe – dearly-bought wisdom. I have a paper that had to be retracted because I didn't look at a plot that would have revealed the problem with the data right away.
And no, I certainly didn't think you were advocating hypothesis tests over actually looking at the data. I know what happens when you do!!