Exploring the similarities and differences between these three powerful machine learning tools (PCA, NMF, and Autoencoder) has always been a mental challenge for me. Anyone with knowledge in this field is welcome to share it with me.
In machine learning projects we often run into curse of dimensionality problem where the number of records of data are not a substantial factor of the number of features. This often leads to a problems since it means training a lot of parameters using a scarce data set, which can easily lead to overfitting and poor generalization. High dimensionality also means very large training times. So, dimensionality reduction techniques are commonly used to address these issues. It is often true that despite residing in high dimensional space, feature space has a low dimensional structure.
Principal component analysis (PCA) is a technique for reducing the dimensionality of such datasets, increasing interpretability but at the same time minimizing information loss. It does so by creating new uncorrelated variables that successively maximize variance.
Nonnegative matrix factorization (NMF) has become a widely used tool for the analysis of high-dimensional data as it automatically extracts sparse and meaningful features from a set of nonnegative data vectors.
An autoencoder is an unsupervised learning technique for neural networks that learns efficient data representations (encoding) by training the network to ignore signal “noise.” Autoencoders can be used for image denoising, image compression, and, in some cases, even generation of image data.
Article projectR: An R/Bioconductor package for transfer learning vi...