I the LightGBM algorithm to predict the outcome of tennis matches. Next I made two confusion matrices, one for the training set and one for the test set. I calculated stats as precision, recall, and the F-score, as shown in the tables below. The thing I am curious about is why the precision of the longshot drops dramatically from 67.64% in the training se to 57.67% in the test set. I figured this happened because of the class ratio in the training set. There is an imbalance ratio of 2.02. The favorite seems to winning 15930 times and the longshot player 7849 times in the training set. Due to this minority the algorithm lacks learning from the longshot class?
Similar phenomena happens whenever the predictions come from the XGBoost algorithm, see the file that has been added to this question. You can see the confusion matrices, precision, recall and F-score.
I used the `scale_pos_weight` parameter to account for the class imbalance, but with no effect. Also I tried undersampling and oversampling methods, but also with no effect thus far.
Would the precision of the longshot player improve whenever I tackle this imbalance in other ways? Or isn't this the problem?