I believe the deep learning have a power in a huge data. Therefore you should back to discreate distribution such as LSI or LDA. But if you still intersted to do that in deep learning, Recurrent neural network RNN almost do that.
Why have you already decided on "Deep Learning" a.k.a. neural networks? If you are working on a text / word problem I would suggest you to spend time on data pre-processing before you decide on the learning algorithm. Why in the first place are you focussing on a "supervised" method? What do you want to predict? Do you have class labels? How many positives and how many negatives? I would suggest you to look into PCA and PLS? And if you want to stay in the black box realm i would strongly suggest you to go for support vector machines instead of neural networks for text problems.