I want to see whether my independent variables predict my dependent variable. My study is exploratory. I cannot find a simple answer online as to use a stepwise or standard multiple regression using the Enter method on SPSS.
Megan Wood A typical multiple regression will show you the variance explained by all the predictors included in the model at once. Stepwise regression is used to see how the variance explained, R2, changes by adding (or removing) each predictor to the model one at a time. In short, stepwise regression helps you assess the relative importance of each predictor and answer the question, "Does my model do a significantly better job at predicting the outcome variable when I add (or remove) particular predictors?"
Megan Wood , stepwise regression is just garbage. See the excellent discussion in:
Harrell, F. E. (2001). Regression modeling strategies: With applications to linear models, logistic regression, and survival analysis, Springer-Verlag, New York. Miller, A. J. (2002). Subset selection in regression, Chapman & Hall, London.
In concert with Ronaldo Gonzales' post, I too would urge you to avoid any of the "step" methods in arriving at a regression model. Generally, if you're conducting an "exploratory" analysis and had some defensible reason for considering a given set of IVs in the first place, it's difficult to understand why you wouldn't evaluate a full model as your starting point. Yes, subsequent inspection, and cross-validation might cause you to revise it, but step methods won't necessarily give you what you're after. Here's why:
1. Step methods are very opportunistic, and the resultant models may not be stable across samples (let alone hold for the population). Hence, generalizability is a concern; validation samples are a must.
2. There is no assurance that step methods will arrive at the "best" ensemble of IVs for a given DV, regardless of your criterion for "best."
3. Minor adjustment of variable entry/variable deletion criteria can affect the performance of step methods, in unpredictable ways.
4. The internal significance tests are evaluated incorrectly in a number of software packages (e.g., the tests are frequently too liberal).
5. Step methods will frequently omit variables which could help the model's performance, due to phenomena such as the suppressor effect.
DEAR BRUCE, BTW THERE'S NO SUCH THING AS PENALIZED STEPWISE REGRESSION. Unless you mean stepwise regression itself which has a least squares penalty if you want to call it that and that's unknown in my experience. Austin and Tu reference above shows it's not reproducible. Please 🙏🙏🙏 finally read the Austin and Tu reference. David Booth
DEAR DAVID, I used that wording so that you would not think I was including LASSO in the list of methods I was decrying. (I may have misunderstood, but I thought you believed I was doing so in another thread a little while ago.)
You seem to believe that I endorse the use of one or more of the classical stepwise methods. I don't know where you got that idea. To be clear, I do not support the use of any of the following variable selection methods:
Stepwise
Forward selection
Backward elimination
All possible subsets (see Frank Harrell's comments in the Stata FAQ on stepwise regression--I posted the link earlier in the thread)
Bruce I am tired of arguing with you. IMO and that of other statisticians the Austin and Tu reference I suggested above shows that stepwise and it's variants are not reproducible and hence of zero value to science. In addition the penalty factor is defined many places. So I will leave that to you. However I believe the term was first used in reference to ridge regression not stepwise. I refer you to the early work showing that there's a relationship between ridge regression and Bayesian statistics. I refer you to elastic net/glmnet literature as well. You can find a brief introduction to these in Efron/Hastie Computer aided statiistical . methods. Published 2016 I believe. Best wishes for a good day. David Booth
David, I too am tired of this. But I do have one final suggestion. Please direct me to any specific statements or conclusions in the Austin and Tu article that I have contradicted. Thank you.
Hi all, just another quick question if anyone is able to help. In a similar paper; the authors only add in the independent variables that significantly correlated with the DV when running correlation tests. I have ran a multiple regression for one DV with all of my IVs, which is nonsignificant. When I run the multiple regression by adding just the significant correlations, the regression model is significant. I’m unsure which is the right way. I have ran all the the assumptions before running the regression eg visual inspection or a linear relationship between each IV with the DV, but unsure if it actually needs to be significant as per the p value
One further comment. You said, "I have ran all the the assumptions before running the regression..." (emphasis added). Given that the major assumptions are about the errors, you cannot check them until after you have estimated the model.
Step-wise tries to eliminate variables that have high correlation among themselves, so it's better than the standard regression, where variables with a low contribution are kept.