I understand that TFIDF select the features by weighting the most frequently used terms in a single document and penalising the terms that are used in a wide spectrum of documents.
Why is TFIDF different from term distribution? I mean by term distribution the division of the term count per document/sum of term counts per all documents
In this case, a word such as "the" will have near-uniform distribution with equal percentages for all documents. And a specific word such as "Hamlet" will have zero occurrences in all documents except for Shakespear documents that will have the high percentage.
in this case, the term distribution will achieve the same target as TFIDFby rewarding the frequently used words and penalising the words in a wide spectrum of documents.