I was hoping some folks with a background in statistical modelling could lend their expertise. I'm working on a difficult problem and was wondering if you could list the priorities or steps that you take on these types of tasks. For example, my list might look like this:
1) Feature extraction/engineering
2) Observe performance of baseline model (using regression or decision trees)
3) Iterate steps 2 & 3 until all features exhausted or content with progress
4) Try different supervised learning algorithms including ensemble methods for final model
What are some of the items that I'm missing? For example, is outlier detection and filtering typically helpful? Any tips you can provide would be greatly appreciated.