Background:
I have a large dataset that does not fit inside memory for normal PCA. Hence, I am using Scikit-learn IncrementalPCA (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.IncrementalPCA.html#sklearn.decomposition.IncrementalPCA) and its associated partial_fit and transform methods.
Question:
Can the partial_fit and transform methods be called in the same training loop as listing (1) or should I call partial_fit over the whole or subset of the training dataset first in one loop and transform it another loop as listing (2)?
Listing 1
ipca = IncrementalPCA(n_components=10) for X in generator: if train == True: ipca.partial_fit(X) X = ipca.transform(X)
Listing 2
ipca = IncrementalPCA(n_components=10) for X in generator: ipca.partial_fit(X) for X in generator: X = ipca.transform(X)