Seeking insights from the research community: Does the imbalance of textual genres within corpora, when used as an explanatory variable rather than the response variable, affect the performance of logistic regression and classification models? I'm interested in understanding how the unequal distribution of word counts across genres might introduce biases and influence the accuracy of these machine learning algorithms. Any explanations or relevant details on mitigating strategies are appreciated. Thank you!