Please, explain linear and non-linear dimensionality reduction methods for large data. Also explain the impact of the machine learning techniques for the same.
PCA is a dimension reduction technique that builds totally new dimensions (lets say Y) from the original data with X dimensions. The new ones keep the variety of the data as much as possible which "hopefully" doesn't affect the quality of data manipulation (i.e. classification) later. PCA could also even improve the quality in case of removing useless/irrelevant original dimensions.
GA is also capable of doing the same task by heuristically combining different sets of dimensions ans see the influence of that on the main task. Nevertheless, I do recommend PCA.
For big data (e.g., ImageNet) I strongly recommend you use incremental methods, for instance, incremental PCA and incremental LDA . These methods estimate the projection matrix using a single sample at a time. In other words, only a sample must be in memory at a time.
You can find more details regarding these methods in
- "Candid Variance-free incremental principal component analysis"
- "Incremental partial leasts quares analysis of big streaming data"