What is the difference between stacked autoencoders and CNN for pixel-wise classification based feature extraction ? Is that the quality of the features, the processing time ?
If you don't have labeled data then SAE is best option and if you have labeled data then depends on your data nature which technique would be suitable for pixel wise classification.