I think it depends on the size of the dataset. If you have a small size dataset, then you can initialize the population with the selected features by using a filter approach. In this case, you will start the optimization process with relatively good solutions.
Another approach that you can use with the large size datasets, you can use a filter approach (e.g., mutual information) to rank all features in the dataset, then select the top X features (X may be 20, 30, ...) -the number of features to be selected depends on the dataset, so you need to do some experiments to see which number of features that best contribute to the classification accuracy -After selecting the highly ranked features, you need to update the dataset to contain those features only, then apply the wrapper approach on the new dataset.
I think it depends on the size of the dataset. If you have a small size dataset, then you can initialize the population with the selected features by using a filter approach. In this case, you will start the optimization process with relatively good solutions.
Another approach that you can use with the large size datasets, you can use a filter approach (e.g., mutual information) to rank all features in the dataset, then select the top X features (X may be 20, 30, ...) -the number of features to be selected depends on the dataset, so you need to do some experiments to see which number of features that best contribute to the classification accuracy -After selecting the highly ranked features, you need to update the dataset to contain those features only, then apply the wrapper approach on the new dataset.
Wrapper method has a gap that many researchers are working to solve , " improving the search phase in wrapper method" , so you can choose suitable optimization algorithm then start to execute experiment ( wrapper based vs your new improved wrapper)
Please read my paper
A Novel Chaotic Chicken Swarm Optimization Algorithm for Feature Selection
Conference Paper A Novel Chaotic Chicken Swarm Optimization Algorithm for Fea...
As I mentioned in my first answer, the number of features to be selected depends on the dataset, so you need to do some experiments to see which number of features that best contribute to the classification accuracy. Select the top 20 features and test the algorithm, then select the top 30 features and test the algorithm. you may select like 5 different values! then select the number of features that gives you the highest accuracy.
You need to select a filter FS approach and apply it on the original dataset. This step aims to is to exclude the redundant and irrelevant features. In the second phase, to further explore reduced feature subset and identify a subset of informative features you may employ a metaheuristic algorithm with a learning algorithm such as KNN or SVM.
You may refer to this paper. They used the same approach the you are asking about. They used the MRMR filter approach then the used the Bat Algorithm with SVM classifier as a wrapper approach.
Article MRMR BA: A hybrid gene selection algorithm for cancer classification