The best method to evaluate your classifier is to train the svm algorithm with 67% of your training data and 33% to test your classifier. Or, if you have two data sets, take the first and train SVM, and take the seond database and test.
If you want to evaluate the performance, your first data sets is used to train the SVM, and the second learning data, which are not perfect (e.g. Noise) is taken for testing the SVM trained.
To get performance, you have the accuracy, the precision, the recall, the f1-score (or f-measure) and the cohen's kapa.
There are generally two acceptable validation methods as splitting percentage and cross validation. If you have enough data, you can split it into two parts train and test with size of 70% and 30% of data respectively. Next, you should run the learning algorithm for 10 or more times and report (precision, recall, f-measure, auc) the average. For not enough large or small datasets, k-fold cross validation is better (k=1 or 5 or 10 ...). Also, If you want to see the robustness of the learner, CV is advised.
Just like Kouser; To evaluate performance, you need to calculate the percentage of correctly classified observations of class 1 aka sensitivity and that of class 2 aka specificity then plot sensitivity Vs 1-specificity over all thresholds aka ROC curve
In order to evaluate the performance of your classifier (using cross or k-fold validation), reliability can be assessed by computing the percentage of correctly classified events/variable as well as by a complete confusion matrix, which summarizes how many instances of different event got confused by the system . The row of a confusion matrix show the number of instances in each actual event class (defined by the ground truth), while the columns show the number of instances for each predicted event class (given by the classifier’s output).
Generally, the classification performance can bemeasured by: F-score=2xSexP/(Se+P) where P=TP/(TP+FP) stands for the probability that a classification of that event type is correct. , Se=TP/(TP+FN) and Sp. = TN/(TN+FP) are Sensitivity & Specificity respectively. fromp wich you can build the ROC curve then compute the AUC (area Under curve) parameter.
you can find all what you need to know here:
[1] Simon Rogers MG., A first course in machine learning. Machine learning & pattern recognition, Cambridge UK: Chapman & Hall / CRC; 2012. .
[2] M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437, Jul. 2009.
Let me know whether you can not get acces to the documents evoked above.
I use AUC and ROC as metrics for performance of my classifiers.
Every classifier predicts a probability. For simplicity, I'll assume a binary classifier (0/1).
By default, statistical packages will interpret probability of 0.5 or greater as 1, and smaller as 0. Based on these labels, you can validate (using cross or k-fold validation) the classifier performance derived from a confusion matrix (precision, recall, etc.)
However, if you set the threshold as 0.25, your confusion matrix will change and other metrics will change as well. Similarly, threshold of 0.75 will get yet different results.
So, I find ROC and AUC to be better descriptors of classifier performance. Together, these metrics are telling me how much of the variance in my target variable can be explained by the classifier.
If you want to compare the performance of your SVM with, say, a boosting algorithm, or a random forest, you can simply compare the validation ROC + AUC of the models.
I have some confusion ROC curve is mainly used in Bio-informatics. For text classification ,Character recognition we mainly use Precision,Recall or there harmonic mean F -score.Can we use ROC curve for all ,or is there any reservations.
The method which all researcher have been following for evaluation is to divide data into 60, 20, and 20 % for training validation and testing.
Then calculate confusion matrix for validation data as well as test data and also calculate precision and recall for test data this the total evaluation.