Say there is a set of scanned documents one wants to classify. One can assume there is a set of manually selected examples for each class.
The classes would contain, for example: receipts, medical reports and invoices.
My go-to algorithm would be K-NN, but the question would be: which features to use?