My dataset consists of EEG electrode power features in all power bands(alpha, beta, delta..both relative and absolute) and source power features (obtained after sLoreta analysis) in addition to connectivity strengths between the different sources (brain regions). There are as many as 20k features in all.
If i have to predict disease (dementia) based on all above features, what approach will yield best accuracy on test sets? I initially thought that maybe i must fit seperate classifiers for each type of feature set (after dimension reduction) and then use the output probabilities obtained to write a meta classifier on top to predict the final disease state.
However, i think that may perhaps not be so great as all features are correlated (as source estimates and connectivity measures are obtained from the electrodes themselves). Is this correct?
I used KernelPCA to select a few components from the entire dataset and then run a classifier on top of the transformed dataset with cross validation. I get an accuracy of around 75% only on test sets. I have to improve accuracy atleast by another 15%. I used extremely randomized trees but the accuracy was not that much.
What other approaches can i use?
I am looking for a good discussion on possible approaches and/or a sample solution. Thank you.