Is it possible to use backward regression for all variables in a OLS model, but to leave the year dummies out of the bacward selection, so that these dummies appear in the final model even if they are not significant?
I would not suggest backward or forward selection at all. The best set of predictors is not likely to be one of the combinations which you would find that way.
It is important not only not to use too few nor too many predictors, but to use the right ones. They act together, generally with more collinearity and other problematic relationships than would be ideal.
There are other methods which you might find researching "model selection," but using your subject matter expertise could help you decide on some alternative models which make sense. However you arrive at two or more alternative models, you could compare fits using the same sample on the same scatterplot, using a "graphical residual analysis." I often suggest that, and to avoid overfitting to a particular sample such that your selected model might be a much worse fit to other parts of the population or subpopulation to which you want to apply it, you could research "cross-validation." If you have enough data for two or more different samples, you might accomplish all of this by comparing the model results for each sample on a separate scatterplot for each sample.
Note that a graphical residual analysis used to study fit includes considering heteroscedasticity, which should be modeled also. Heteroscedasticity is expected when predictions are of different sizes, which is the general case. The best set of predictors may not be good enough to mimic the behavior of the y-variable, however, which is discussed in the following:
Note that OLS regression is just a special case of weighted least squares (WLS) regression, where weights are equal, but this may be far from realistic. (When you have autocorrelation too, then you need GLS regression.)
If you insist on a backward selection, though such sequential selection procedures are not recommended, I would guess that if you left dummies out in the beginning, thus grouping some data that you would not have wanted to be grouped, that that could be a (further) problem with selecting predictors this way. Maybe you'd want to start with them included, and never exclude them?
Dear James, thank you so much for your useful answer! We have to use bacward selection and indeed, I wish to include these dummies at the start and I do not want them to be excluded from the regression.
I don't know why you'd want to use that selection method, but if so and your software somehow wants to treat dummy variables as if they weren't dummies, then maybe it would accept an if-then kind of command such as 'if dummy selected for removal, then skip.' - Best wishes.
I too would be very wary if you are developing a model for understanding, but you question is a software issue - so Minitab for example gives you this facility
https://support.minitab.com/en-us/minitab/18/help-and-how-to/modeling-statistics/regression/how-to/fit-regression-model/perform-the-analysis/perform-stepwise-regression/#potential-terms where
Displays the set of terms that the procedure will assess. Indicators (E or I) next to the term in the list signify how the procedure handles the term. The Method you choose determines the initial settings in this list. You can modify how the procedure handles the terms with the two buttons below. If you don't use these buttons, the procedure can add or remove the term from the model based on its p-value.
E = Include term in every model: Select a term and click this button to force the term into every model regardless of its p-value. Click the button again to remove this condition.
I = Include term in the initial model: Select a term and click this button to include the term in the initial model. The procedure can remove these terms if its p-value is too high. Click the button again to remove this condition.
One possibility is combining backwards (or forwards) selection with constraining the sum of the absolute values of the beta values. This is called the lasso (book downloadable from https://web.stanford.edu/~hastie/StatLearnSparsity/) and there are variations of it. But, this still has some of the issues as the traditional backwards/forwards selection, so if you have a substantive theory to guide you that is better.
Also, it is worth you saying WHY you need to do backwards selection. Is this a class exercise?
For the original question, sure you can include them. and depending what software you are using you can write a function for that.