As mentioned in the paper https://nlp.stanford.edu/pubs/glove.pdf, the authors learn two word vectors(one being word vectors W, and another context based vectors W~ ). Why two separate vectors were required and how they are being learned ?
You can think about that as "target" vectors and "context" vectors. Each word should have its own vector (target) but it also serve as context for other words. Since all computations are symmetrical in the end both vectors should be identical. But since they use approximate computations the two vectors might be slightly different. In the end they use an average of the two as the final vector for the word.