Recently, I'm working on constructing a dataset for detecting fake news in the Bengali language. I'm looking for guidance on:
1.Identifying credible sources for real and fake news
2.Handling code-mixing and dialects
3.Annotation standards and class balance
4.Cultural/linguistic challenges in low-resource NLP tasks
If anyone has worked on similar datasets or has experience in multilingual fake news detection, I’d really appreciate your insights🙂