Suppose you've variance of a and b in X and Y axis respectively for some 2D data. Now, if you shift the X axis in a way so that all the points can be placed on or near the new X axis, the variance b on the new Y axis will decrease and move near to 0 (new variance b'). On the other hand the shifted new X axis will have a higher variance a' than before. So, even if we ignore the values on Y axis, the information loss will be insignificant. And this is the basic idea of how PCA works. It's in the mathematics of PCA that proves that taking higher variance PCAs will cause a smaller information loss while data reduction.