For vanilla Transformer language models (Al Rfou et al), you process [1 2 3 4], predict 5, process [2 3 4 5], predict 6, and repeat.
For a Transformer-XL language model, you process [1 2 3 4], predict [5 6 7 8], predict [9 10 11 12], and repeat. Note that we don't have to reprocess [5 6 7 8] when predicting [9 10 11 12], because the representations can be reused.