BERT has token length limits with respect to handling text.
And I want to classify each large paragraph of text if it is appropriate or inappropriate for kids.
Splitting the paragraphs (large text) into smaller chunks is not helpful as the ground truth labelling for training is for the entire text and not individual chunks, for which it may vary.
If we split paragraphs into sentences and then do it, the training may be polluted because in many texts only a small part in the large paragraph may be inappropriate and the remaining individual chunks will also get trained as inappropriate.
Therefore, I need to process the full paragraph in one go.
Any suggestions please?
#NLP #ML