I found a paper entitled "Multimodal representation: Kneser-Ney Smoothing/Skip-Gram based neural language model". I am curious about how the Kneser-Ney Smoothing technique can be integrated into a feed-forward neural language model with one linear hidden layer and a softmax activation. What is the purpose of the Kneser-Ney in such a neural network, and how can it be used for learning the conditional probability for the next word?

More Dimitris Dimitriadis's questions See All
Similar questions and discussions