I do several projects involving supervised learning on text and am always interested in finding useful features. For example, some of the ones that are helpful:

- the words themselves: unigrams, bigrams, trigrams

- % of unique words

- readability indices

- vocabulary richness indices

Would anyone be willing to share any text features that they have found useful for supervised learning on text?

More Justin Fister's questions See All
Similar questions and discussions