I am relatively new to machine learning. I am part of an in-class competition on kaggle where the task is to predict churn. I trained my model (random forests 50-200 trees) on the test data and got up to 80% accuracy using a 70-30 split. However, when I predict on the unseen data, I get zero accuracy on kaggle.
For some reason, I decided to invert the labels thinking something went wrong and then I get 100% on kaggle. Can someone explain to me why my model wrongly predicts every case in the unseen data. Confidence level of each prediction is mostly >0.7 (ergo it's not a coin toss).