I have severely optimized SVM meta-parameters on the Iris classification dataset. For this I used 1000 times random cross validation and I took the best parameters to run the algorithm again. After that I observed that the accuracy on the test set was higher (around 1.3% error) compared to the training set (around 3.3% error). The difference is very significant. I do not have such results on any other dataset I tested on. How can this be explained? Ok, the parameters are optimized with the accuracy of random test sets, but how can a model do better on the test data than on the training data?