Shift invariant can be understood as invariance against translational movement. I.E. if it doesn't matter that the object in image moves relative to camera or camera location(s) change relative to object system can be considered to be shift invariant. Typically if your images are not aligned (registered) you get some ghosting in your fusion results.
Transformations can be classified similarly. So if you do some transformation that is shift invariant, then it do not matter if the input is shifted in input domain - the result is same in the output domain. This may make for example comparison of signals easier. However, you have to be careful whether you mean this kind of invariance or Linear Shift Invariance (or linear time invariant systems) where there is subtle but significant difference to the definition...
One has to realize that often it's not enough to consider only translational movement, but you should be worried about also on rotations, scaling, projections and so on. A decent book on the topic I'd recommend would be "Multiple View Geometry in computer vision" by R. Hartley and A. Zisserman.
If you just want to align two images, with different modalities and translational movement only, you might want to look at phase correlation based techniques...