Hi,

For doing the evaluation of different feature subsets in the wrapper feature selection methods we need to use the train and the validation sets. and after selecting the best feature subset we use the testing set wich is different from the validation set in order to get the final performance.

We split the dataset randomly in three subsets  : training (70%), validating (20%) and testing (10%).

After getting the best feature subset using the training and the validating sets , We evaluate this feature subset on the testing set and we found bad performance. We have tried this for several randomly chosen divisions.

in general we have observed that the feature subset that gives good performances (accuarcy)  for the testing set gives inferior performance for the validating set. so the question is how to carfully choose the validation and the testing sets in some manner to get the best feature subset for the testing set but using the validation set.

Thank you.

More Wassila Guendouzi's questions See All
Similar questions and discussions