I'm working with a large dataset (P = 13, N = ~4500) with lot's of nonlinearity. I'm interested in reducing dimensions prior to clustering.
Kernel PCA seems like a good approach. I'm wondering if it makes sense to judge the rough quality of each fit by comparing the sum of the first few eigenvalues (equivalent to total explained variance..) over multiple values for sigma. Working with a Gaussian kernel (for now at least), I'm fitting a kpca multiple times, varying sigma over several orders of magnitude, from .0002 to 20, and retaining the first 10 features of each fit. My hope is to identify the best of my current test values for sigma, and then contract the search range and home in iteratively.
I haven't found a lot of guidance on how to tune sigma that aren't - to be honest - over my head. Reconstruction error seems like a promising alternate metric, but I'm not sure how to implement it in R.
I'm grateful for any thoughts or suggestions.