For major document classification, the features are bunch of words that should be related to the topic your working on, and also associated with target class. However, the tokenisation technique is quite important to characterize these features during preprocessing stage where you need to look at aspects such as stemming, stopwords list, etc.