A minimum of two support vectors are required for each decision hyperplane in the model. This follows from the observation that the margin at each decision boundary must be defined on each side of the dividing hyperplane by the closest data points, which are the support vectors.
You can use a support vector machine (SVM) when your data has exactly two classes. An SVM classifies data by finding the best hyperplane that separates all data points of one class from those of the other class. The best hyperplane for an SVM means the one with the largest margin between the two classes.
The support vectors are the data points that are closest to the separating hyperplane; these points are on the boundary of the slab.
So back to your question how to find the minimum number of support vectors for a machine learning classification problem?
The data for training is a set of points (vectors) Tj along with their categories Cj. For some dimension D, the Tj ∊ R^D, and the Cj = ±1. The equation of a hyperplane is f(T)=T′β+b=0
where β ∊ R^d and b is a real number.
The following problem defines the best separating hyperplane (i.e., the decision boundary). Find β and b that minimize ||β|| such that for all data points (Tj,Cj),
Cjf(Tj)≥1.
So, the support vectors are the Tj on the boundary, those for which Cjf(Tj)=1.
For mathematical convenience, the problem is usually given as the equivalent problem of minimizing ||β||. This is a quadratic programming problem. The optimal solution (ˆβ,ˆb) enables classification of a vector z as follows:
class(z)=sign(z′ˆβ+ˆb)=sign(ˆf(z)).
ˆf(z) is the classification score and represents the distance z is from the decision boundary.
This link is much helpful for understanding such optimization problem: https://stats.stackexchange.com/questions/313660/what-are-the-support-vectors-in-a-support-vector-machine
SVM classification method applies the concept of margin hyperplanes, which can be imagined as a surface maximizing the boundaries between the different types of data in order to create subspaces with homogeneous observations.
To find the optimal hyperplane, the margin, i.e. the double of the distance between the hyperplane and the nearest training data points (called support vectors) is maximized.
Suppose we have a d-dimensional set of data points with labels -1 und +1 that we want to classify using SVM. For classifying, we need two margin hyperplanes at equal distance from the optimal hyperplane which separate the data points. Since we are in d-dimension, each margin hyperplane can be constructed using a minimum of d support vectors.
So for classifying we will need a minimum of 2*d support vectors.
Grid search for each of the unknown variables such as minimum number of support vectors in the classifier which results in the highest percentage in the classifier
As mentioned in the first reply, the minimum number of support vectors is obviously 2, and the resulting optimal hyperplane is the perpendicular bisector of the line joining the two examples in the feature space.
Now if your question is "How do I calculate theoretically the minimum number of support vectors given the data that is available to me", the answer is: you can't. You must run your SVM software (which means making a lot of choices - kernel, regularization constant, hyperparameters of the kernel, ...) and see what happens.