A simple solution is to use PCA or LDA, if the problem is dimensionality reduction. If the problem is Feature Subset Selection (FSS) i.e. selecting subset of original features set,
Principal Feature Analysis (unsupervised) venom.cs.utsa.edu/dmz/techrep/2007/CS-TR-2007-011.pdf
and SVM Recursive Feature Elimination SVM-RFE (supervised method) www.eecis.udel.edu/~yuy/report0531.pdf
are two good approaches. Particularly SVM-RFE has sound theoretical basis as compared to Decision Tree, Genetic Algorithm, or Heuristic Search based FSS methods.
You can use PCA or LDA to compress features. But, after compression, you have to determine the number of appropriate PC scores.
Another way is to use a feature selection procedure. You may use a "filter" or a "wrapper" technique. My suggestion is to apply a wrapper technique like Genetic Algorithms (GA).
if your problem is feature subset selection and your data is not noisy data at all then Wrapper with KNNC might be a good choice ,But Wrapper is slow .you can use any learning algorithm with wrapper that takes less time to build and test model.If there is no time constraints you can try any other combination as well.
This is a good example for curse of dimentionality. If your aim is to map original data onto new space where features are non-correlated, then PCA (or ICA, CCA, etc.) can help. We used PCA in [1] and [2}. By using PCA in [1] more than 99% and in [2] more than 98% reduction in the dimension of feature vectors was achieved. Try using Unscrambler software, which is the complete multivariate analysis and experimental design software, equipped with powerful methods including PCA, MCR, (PLS-R, 3-Way PLS Regression, K-Means Clustering and SIMCA Classification. You can visualize PCs easily.
[1] Omid, M., A. Mahmoudi and MH. Omid (2010). Development of pistachio sorting system using PCA assisted artificial neural networks of impact acoustics, Expert Systems with Applications 37(10): 7205-7212.
[2] Omid, M., A. Mahmoudi and MH. Omid (2009). An intelligent system for sorting pistachio nut varieties, Expert Systems with Applications 36(9): 11528–11535.
Although random forest is a good choice for projecting high dimensional data to a lower dimensional subspace, since the projection is random you will have less variance and better accuracy. But if you are using a sparse feature set, which is the case in most high dimensional data, random forest may just backfire as it may select useless or uninformative features , thereby degrading the classifier.
The best choice could be using a weighted feature method , where features that have more information are assigned more weight, hence while selecting features you could use a set of informative features rather than useless ones and apply random forest to it, which I guess should give you better results. Best of luck.
You may also use feature transformation and feature selection methods together as a hybrid solution. For example, you may first perform transformation (e.g., PCA, LDA, etc.) and then apply a feature selection method (either wrapper or filter) to further decrease the feature dimension.
Dear Gunal I think I'd better first use a feature selection and then perform feature extraction method because the data set is noisy and feature extraction methods like PCA use all features.
The data has very high dimensional and it is a sparse matrix.
Thanks for your reply. I don't know about SIFT but I'm intrigued about it. I'll appreciate if you give me a good reference for theoretical concepts of SIFT and your source code. I'll try it on my data set and inform you about the results.
I think you should give us more information about your dataset, Goal of classification and your limitation in your project.
If you are working on images, you have many choices and methods to perform this task.
if you are working on numerical data, and looking for feature extraction or selection from these kind of databases, you should apply data analysis methods.
My MS thesis relates to "feature selection" in order to improve the performance of machine learning Algorithms like KNN. I suggest you to try feature selection by evolutionary algorithms like GA, ICA, and PSO; of course if you are interested in optimization algorithms ,otherwise, study methods like PCA (Principle component Alalysis) .
I developed clustering based techniques, e.g. "clustering based model order selection of input-output models" You can find the related paper at: http://www.abonyilab.com/clustering