why do we need optimization techniques in feature subset selection? Is it necessary to use an optimizer to perform feature selection for a large number of features?
Given a set of features and the problem of choosing a subset from it automatically boils down to choosing a specific/a set of specific subsets of the large possible many combinations, so the idea is to associate a property with such subset combinations such that the value at which the property is optimised(minimum for a cost function or maximum for a profit function) the possible subsets of feature sets that satisfy this condition are the "optimal" set of features in the framework of your problem.
First, the question whether to use metaheuristics or exact approaches has been comprehensively addressed in other research gate-threads. Of course meta-heuristics CAN find optima, but they cannot, as correctly mentioned, find and prove optimality (in general).
Secondly, once you have a real problem and too many data, you might be interested to reduce "complexity" (not meant in a computer science context), so that the issue is existing.
Thridly, brute force depends on what you do; and you can make it good in both dimensions, exact and (meta-) heuristics.
Finally, while solving real problems, you might not publish exactly the real case, but you can do something for the academic side. For instance, in Article A reference model for customer-centric data mining with supp...
we argue that it is not only about feature selection but also on other things (model, instance).
if you want an approximate solution, you can use first the filter methods, they give generally good solutions. if you want an exact solution, you can use branch and bound method but it is too hard to deal with this problem. The metaheuristics could be also used but generally they are slower than filter methods and faster than exact methods.
Because of "the curse of dimensionality". Feature selection is the process of finding the best (useful, relevant) feature subset. Thus, to verify all possible feature combinations for a set of "n" features, you have to verify 2^(n) -1 solution (excluding the empty subset), which is very computationally demanding. For a computer that can verify a million solutions per second, it will takes more than 35 years to verify all possible combinations for just 50 features. That's why we need optimization for FS.