We have, supposed, 100 classes. Each has 100 samples.
If we use two-class svm in training. The training must repeat 100 times to classify in one-vs-the-rest paradigm. If we take all samples of the rest in training, the performance is always the worst.
How to choose negative samples?