Hi everyone,

I'm performing a sparse Partial Least Squares (sPLS) model to understand if the analysed contaminants (55 individual contaminants) explain my response variables (42 response variables) in 21 individuals. Initially, I ran the sPLS with 3 components, 10 folds, and 5 repeats. When I checked the R² of the model, I obtained the following results:

```

Component Mean_R2

1 1 0.14073032

2 2 0.07010117

3 3 0.04363874

```

However, after reducing the sPLS model by selecting the most important variables (20 for responses and 15 for the contaminants) and using the same parameters (3 components, 10 folds, and 5 repeats), I found that the R² for the third component increased significantly:

```

Component Mean_R2

1 1 0.06327634

2 2 0.05657111

3 3 0.15668819

```

This unexpected increase in the R² for the third component... Is there something wrong with my model? What could be the possible reasons for this discrepancy?

I would appreciate any insights or suggestions on what might be happening here. Could this be related to overfitting, the selection process of important variables, or perhaps an issue with how the components are being calculated?

Thanks in advance for your help!

More Inês Morão's questions See All
Similar questions and discussions