Language models like GPT-3 have revolutionized modern deep learning applications for NLP, leading to widespread publicity and recognition. Interestingly, however, most of the technical novelty of GPT-3 was inherited from its predecessors GPT and GPT-2 . As such, a working understanding of GPT and GPT-2 is useful for gaining a better grasp of current approaches for NLP.