I learned regression from Tony Bryk, and he explained step-wise regression as "a-theoretical and lazy." He then explained how economists approach data analysis, but I don't want to get into that fight :-)
Seriously, why WOULD you want undergraduates (or, anyone) to get into the habit of analyzing data without having a reason why he/she thought X might predict Y?
If I were putting the lesson together, I would probably give my students an empirical demonstration: first have them pick out logical predictors for an outcome from a codebook, then demonstrate how the results look if you put the predictors into the model according to their theory, then demonstrate how the results look if you let the program do it automatically. Then talk about what the differences are between the two sets of results - why they came out differently, what the analysis used to decide on a second and third step vs. what the students thought should go in second and third. You will get a lot farther if you have them figure out what is different than trying to explain to them that it is.
Witold, do you mean hierarchical in the Bryk and Raudenbush sense, or some other way? As far as the variants for model selection, comparing for example forward stepwise with the lasso, it depends what level students. The new lasso book (https://www.crcpress.com/Statistical-Learning-with-Sparsity-The-Lasso-and-Generalizations/Hastie-Tibshirani-Wainwright/9781498712163) is good, but would require the students have some stats background. However, I think their plots showing diamonds might be able to help explain these without any equations.
Okay, that is different from the way Raudenbush and others use the phrase. This hierarchical means variables are entered according to some model and presumably there are ANCOVA-like research hypotheses tested at each stage. The forward and backwards approaches aren't liked by many methodologists and it is difficult to interpret any of the p values, so I am not sure when they would be recommended.
I learned regression from Tony Bryk, and he explained step-wise regression as "a-theoretical and lazy." He then explained how economists approach data analysis, but I don't want to get into that fight :-)
Seriously, why WOULD you want undergraduates (or, anyone) to get into the habit of analyzing data without having a reason why he/she thought X might predict Y?
If I were putting the lesson together, I would probably give my students an empirical demonstration: first have them pick out logical predictors for an outcome from a codebook, then demonstrate how the results look if you put the predictors into the model according to their theory, then demonstrate how the results look if you let the program do it automatically. Then talk about what the differences are between the two sets of results - why they came out differently, what the analysis used to decide on a second and third step vs. what the students thought should go in second and third. You will get a lot farther if you have them figure out what is different than trying to explain to them that it is.
Another good and freely available source is Efron et al.'s LARS paper (http://statweb.stanford.edu/~imj/WEBLIST/2004/LarsAnnStat04.pdf) which describes some of the problems with the forward approach.
I think a big difficulty is what to do with the results from a forward/backward/stepwise regression, and I think that is tricky to teach because what if a student asks what an individual coefficient and its p value mean. Given the way it is selected to be in the model, this is difficult to answer. Keeping with atheoretical approaches (and I agree with Julia that on most psychology datasets theory should be important for analysis), I think one of the more surprising (to me) results was that p values can be calculated for the lasso. This is described in section 6.3 in Hastie, Tibshirani, & Wainwright's (2015) book, and various papers cited within (e.g., http://www.stat.cmu.edu/~ryantibs/papers/lassosignif-aos.pdf, but there are other approaches). Of course, p values are tough to explain anyway.