Generally feature selection method is used to select relevant feature for classification. But in some research work done additionally optimal feature selection.
We select the features first. After that we evaluate them using special algorithms. This full process (selection and evaluation) has to be assisted with cross-validation (for instance) to guarantee the selection of the optimal features.
The optimal feature set could be a subset of the selected features set. The optimal subset should minimize a cost function (information related or performance related depends on the application) that defined by the user. On the other hand, feature selection is the first step of feature extraction. You decision of choosing time/frequency domain features.
I think that feature selection is normally understood as a process of selecting a subset from a set of features (so the features need to be known in advance) in order to improve the outcome of a certain task (e.g., to improve classification accuracy or decrease the time needed for computations).
When the outcome of the task reaches its best value with the selected feature subset we can say that the subset is optimal. Generally, to find the optimal subset, all possible subsets should be examined. However, for some tasks, e.g., in the text mining domain where there might exist hundreds of thousands of features this would not be feasible. A greedy strategy for feature selection (when adding a feature to the current subset or removing a feature from the current set) is therefore often used. The feature selection procedure thus does not necessarily lead to the selection of the optimal subset.
Feature selection is commonly understood in the literature as selection of an optimal subset of features, therefore I don't see the difference between feature selection and the optimal feature selection. Of course, one has to decide what features to extract in the first place, but that decision is usually not called feature selection.
1. Filter methods select features regardless of the model. They do not consider the relationships between features. From the point of view of the quality of the model, the subset of features is not optimal.
2. Wrapper methods evaluate subset of selected features using model error. They evaluate subsets of features taking into account possible interactions between features. They can select optimal subsets of features.
3. Embedded methods, where a learning algorithm uses its own feature selection process. But the selected subset of features not always is optimal (e.g. in decision trees it is not optimal).
As Avar Pentel says, I don't see the difference between feature selection and the optimal feature selection. I think that a issue more important are the selection of available techniques to feature selection. In my experience this is a hard decision, dependent of the problem and data context. Just do this: run two techniques for selecting features and you will surely get different results
From my experiences, we do first the features selection and use these features in process. If the results aren't as good as expected or in case you've too many features after selection. Then, you need to apply optimization or as you said an optimal features selection.
Generally feature selection is part of the pipeline necessary to explore and prepare your dataset for some classification algorithm.Sometimes depending on the data set ,data type etc dealing with a large number of variables can be challenging, due to the complexity
inherent in multivariate data.Principal components analysis
can be used for optimal feature selection to transform a large number of correlated variables into a smaller
set of composite variables while factor analysis consists of a set of techniques for uncovering the latent structure underlying a given set of variables. These two methods help with some optimal feature selection