Assume we have two classifier that classify multiple data sets with different performance. Please explain the algorithm in detail how compare these classifiers with their performances ?
I suggest that you read the paper "Statistical Comparisons of Classifiers over Multiple Data Sets", Journal of Machine Learning Research, vol. 7, 2006. It is available at http://www.jmlr.org/papers/volume7/demsar06a/demsar06a.pdf
Lets suppose that you have a sample of size N named x. This x follows some distribution. This x represents a measurement of any phenomenon of interest for you. Also, lets suppose that you take another sample called y. This new sample could be a variation in the phenomenon (and improvement in its physical conditions, or any situation) that might lead you to suspect that x is different from y. Statistical tests are mechanisms to help you to answer your doubts.
In machine learning context your phenomenon is the classification ability of any classification algorithm in any problem or data set, what is your x? your classification rate by means of xvalidation, AUC or whatever you prefer.
You have a data set A, and you run two classification algorithms C1 and C2. You feel that 10repx10fold xvalidation is the best performance measure. What could be your x? the 10rep of C1. And your y? the 10rep of C2.
So, you have 10 samples of x and 10 samples of y. The readings by x are better than the readings of y? Let a statistical test says the final word...!
As Tiago, I also suggest: "Statistical Comparisons of Classifiers over Multiple Data Sets", Journal of Machine Learning Research, vol. 7, 2006. It is a good paper which has been very helpful to me.
In my new publication in Patter Recognition journal with name: "New Hermite orthogonal polynomial kernel and combined kernels in Support Vector Machine classifier" we used Wilcoxon Signed-ranks Test for comparison classifiers.