02 June 2020 15 3K Report

Text classification task, if data quantity is low but data quality is not low. We could use data augment methods for improvement.

But the situation is that data quantity is not low and data quality is low. (noise in the labels, or training data accuracy low)

The way I get the low quality data is by unsupervised methods or rule-based methods. In detail, I deal with a multi-label classification task. First I crawl web page such as wiki and use regex-based rule to mark the label. The model input is the wiki title and the model output is the rule-matched labels from wiki content.

More Tong Guo's questions See All
Similar questions and discussions