If you have pdf you can use an OCR for text regognition, then you can esaily perform the feature extraction. Anyway, it is better if you can work with txt, csv, doc, or similar files.
I could get features from a text using NLP algorithms, then I was wondering if there is a transformation function to get an image from a text. If I have an image I can use Image Processing algorithms to represent text.
It just was an idea because there very well-know algorithms to get features from images and I was trying to explore new ways to get features from text. The next idea was to use it for steganography and hide text in images.
I think from text you can extract variety of features such as orthographic, lexico,syntactic and semantic and you can apply deep learning based word embedding approaches so that 100's to 1000's of features can be extracted from text. Even surrounding text can improve the performance in TM. If that is not the case you can use vector representation of the text in which matrix can easily be generated using which kernel based or graph based models with 100's of features can be generated. Using feature selection methods such as MOO or PSO is an import in TM because 100's of feature can easily be generated from which we have to identify the best set for higher performance of our classifier.