in deep learning, encoders map the input of dimension 'D' to a new features space of dimension' where usually (d < D) through multiples non-linear transformations. In the field of Language model(LM) especially the next word prediction problems, recurrent neural network like LSTM and GRU have proved high efficiency.
If your language model is not conditioned on anything, there is no reason it should have an encoder. You can view the sequence-to-sequence architectures and conditional language models, where the output distribution of the language model are conditioned not only on the previous words in the generated sequence, but also on the encoder states, whatever they are supposed to encode (e.g., source sentence in MT, features maps in image captioning).
I am not sure what you mean by deep model, but since you mention tenstor2tensor, you probably mean self-attentive network Transformer network. Similar to recurrent LMs, there is no need for an encoder if you only predict probabilities of the following words given only the so far generated sequence.