I am fairly comfortable with intermediate levels of image processing and machine learning. What resources should I follow to dive into the field of document analysis and retrieval?
A good review from Pattern Recognition: http://davinci.fmph.uniba.sk/~uhliarik4/recognition/resources/due_trier_1996_feature_extraction_survey.pdf
Since that the convolutional neural networks are kind of trendy in that field you can check the publications of LeCun et al. which develop this method for OCR first: http://yann.lecun.com/exdb/lenet/
Also you have this matlab toolbox with different network: https://github.com/rasmusbergpalm/DeepLearnToolbox
It has been used for OCR, and you should also the dataset link in the master thesis of Palm et al.: http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6284
Algorithms for Image Processing and Computer Vision by J.R. Parker has a simple OCR implementation. The book comes with source code so you can use it and extend it if you so desire.