I want to normalize two linguistic corpora; however, I have no idea whether I need to normalize the corpora per 10,000 words or 1,000,000 words. How can I decide about these two? Why do we use 10,000 and 1,000,000 words, but not 100,000, for example?
Thank you in advance.