In forward selection you start with your null model and add predictors. In backward selection you start with a full model including all your variables and then you drop those you do not need/ are not significant 1 at a time.
The basic difference is as said by Robert. If and how to make the selction is a controversial issue in Statistics. The core of the method and the critisism is condensely given in e.g. Wikipedia at:
And I would emphasize on the criticism to use a stepwise regression, which you can also find in the Wiki article. It is driven by statistical and not theoretical considerations, which variables should and will predict the outcome.
In Forward selection procedure, one adds features to the model one at a time. At each step, each feature that is not already in the model is tested for inclusion in the model. The most significant of these feature is added to the model, so long as it's P-value is below some pre-set level i.e., 0.05.
Forward selection has drawbacks, including the fact that each addition of a new feature may render one or more of the already included feature non-significant (p-value>0.05). An alternate approach which avoids this is backward selection. In this method, you starts with fitting a model with all the features. Then you drop the least significant feature (p-value
There are two methods of stepwise regression: the forward method and the backward method.
In the forward method, the software looks at all the predictor variables you selected and picks the one that predicts the most on the dependent measure. That variable is added to the model. This is repeated with the variable that then predicts the most on the dependent measure. This little procedure continues until adding predictors does not add anything to the prediction model anymore.
In the backward method, all the predictor variables you chose are added into the model. Then, the variables that do not (significantly) predict anything on the dependent measure are removed from the model one by one.
The backward method is generally the preferred method, because the forward method produces so-called suppressor effects. These suppressor effects occur when predictors are only significant when another predictor is held constant.
@poppy. i am not too happy with the claim that backward is preferable to forward. if the regressors are independent the final model will be the same regardless forward or backwards. the difference occurs when the regressors are dependent (correlated) being the typical situation in applied work and i cannot see a criterion which would make me prefer forward over backwards, or the other way around. but then again, i almost never use stepwise selection for lack of faith in this approach.
it is possible that results of two methods were different, it is important aims of analysis and nature of relation between variables. it seems better that if we do not any theoretical background compare results of two methods and decide which method is useful.....