I am not familar with image super resolution, but in the field of image feature extraction and image representation, people usually simply use K-means clustering to build the dictionary.
In the field of audio-related researches, I saw many people use Non-negative Matrix Factorization (NMF) to build the dictionary and generate the sparse representation at the same time.
You can also google with keywords like "sparse coding", "dictionary", "codebook" etc.