I have a dataset consisting of two classes (YES/NO). The set is split evenly between YES and NO. I have a set of classifiers, each of which has been optimised to find the set of parameters that produce the highest value for a weighted mixture of the F-score just for YES and overall accuracy (how many of the things it says are right). Overall the classifiers do what I expected--the one I thought would do worst does worst, the one that I thought would do second worst, ..., the one I wanted to best did best.
But they do something which I wasn't expecting, and can't interpret, namely the precision remains fairly stable and it's the recall that goes up. It looks pretty consistent, but I can't tell a coherent story about why this should happen. Good precision and reasonable recall for YES would also improve our weighted mixture, because it would improve the F-score for entails and it would also improve overall accuracy. Can anyone suggest an explanation for what's happening? The attached table might make it clearer. The thing I am actually optimising is in the greyed out column on the right: this increases as you go down the table, which is what I want (how very fortunate for me!), and in each step both F(yes) and accuracy increase. In particular, F-score increases all the way down. What seems weird to me is that the increase in F-score is almost entirely due to improving Recall, and I can't see why that would happen.