Background:

I have a large dataset that does not fit inside memory for normal PCA. Hence, I am using Scikit-learn IncrementalPCA (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.IncrementalPCA.html#sklearn.decomposition.IncrementalPCA) and its associated partial_fit and transform methods.

Question:

Can the partial_fit and transform methods be called in the same training loop as listing (1) or should I call partial_fit over the whole or subset of the training dataset first in one loop and transform it another loop as listing (2)?

Listing 1

ipca = IncrementalPCA(n_components=10) for X in generator: if train == True: ipca.partial_fit(X) X = ipca.transform(X)

Listing 2

ipca = IncrementalPCA(n_components=10) for X in generator: ipca.partial_fit(X) for X in generator: X = ipca.transform(X)

Similar questions and discussions