Confusion matrix interpretation - does my class imbalance result in the difference in precision?

09 December 2020 3 3K Report

I the LightGBM algorithm to predict the outcome of tennis matches. Next I made two confusion matrices, one for the training set and one for the test set. I calculated stats as precision, recall, and the F-score, as shown in the tables below. The thing I am curious about is why the precision of the longshot drops dramatically from 67.64% in the training se to 57.67% in the test set. I figured this happened because of the class ratio in the training set. There is an imbalance ratio of 2.02. The favorite seems to winning 15930 times and the longshot player 7849 times in the training set. Due to this minority the algorithm lacks learning from the longshot class?

Similar phenomena happens whenever the predictions come from the XGBoost algorithm, see the file that has been added to this question. You can see the confusion matrices, precision, recall and F-score.

I used the `scale_pos_weight` parameter to account for the class imbalance, but with no effect. Also I tried undersampling and oversampling methods, but also with no effect thus far.

Would the precision of the longshot player improve whenever I tackle this imbalance in other ways? Or isn't this the problem?

Francisco Luis Giambelluca

Hello Herrie Stengel,

The imbalance always causes changes in the metrics. Unbalancing samples you could favor or damage some of them. Depending what you want of your model, that imbalance could affect you or not, but if you depend of the precision metric I recommend balance that dataset.

Of course, I'm taking assuming that your dataset is right and homogeneous. Sometimes, the best samples are chossed to be part of training and the worst of them are used to test. This is not a complety wrong way to do a good training because we want that our model focuses on a clear example of what we want it to find, but could cause that decrement in its performance when it is tested in real samples.

Another option that you may consider is the overtraining, in this case the diferences between both are low but in some cases could happend for the begin of overtraining.

I hope I helped you.

Gérard Dreyfus

This is a bit confusing. First, what are the classes? From the second column of the table, one might infer that the classes are "Longshot" and "Favourite". However, from the second row of the table, one might infer that the classes are "Player 2" and "Player 1". So what? Let us assume that "Longshot " means "Player 2" and "favourite" means "Player 1".

A classifier has a single accuracy, a single precision, and a single recall. So why are there two different numbers (one per class) in the "precision" column, and similarly two different numbers in the "recall" column?

Let as assume that the elements of the confusion matrix of the classifier, obtained on the data of the training set, are the first two numbers in the "longshot" row and the first two numbers in the "favourite row". If this assumption is true, the accuracy of the classifier is 0.69, its precision is 0.97 and its recall is 0.12.

When the classes are imbalanced, the first thing to do is to compare the performance of the obtained classifier to the performance of the dummy classifier that assigns all examples to the most populated class ("favourite" in your case), irrespective of the classifier inputs; in your case, the accuracy of the dummy classifier is 15930/(15930+7849) = 0.67, its precision is 1, and its recall is 0.

The results obtained by your LightGBM classifier on the training set are: accuracy = 0.69, precision = 0.97 and recall = 0.12. Hence, on your training set, the classifier performs marginally better than the dummy classifier. This probably means that the inputs of the classifier are essentially irrelevant, or that the values of the hyperparameters of the training algorithm were not selected appropriately, or both. The results on the test set are essentially the same, which indicates that overfitting is not a problem.

XGBoost does slightly better on the training set and slightly worse on the test set. Thus, there seems to be some overfitting, but the performance is still roughly the performance of the dummy classifier: there is something seriously wrong with the design of the classifier - presumably the selection of the inputs.

Manoj Koyadan

Please try interpretML (https://github.com/interpretml/interpret)

Using EBM model you can interpret and explain the model a lot better.

If you need any help please send an email

Feedback defines the constitution of an organism?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

Is it true that $\det(V(A))$ may be only $\pm 1$, depending on $n$, for the last symmetric tridiagonal matrix $A$?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Could dyes amplify the spectrum of light to a specific wavelength?

Usage of internal standards in LC-MS/MS analysis?

How to report results of Generalised Linear Mixed Models in a journal article?