Convolutional Neural Networks can tolerate shift and scale variations. They are mostly used in invariant object recognition problems but I think it is also used in other applications such as speech recognition.
There is an additional constaint which has to be mentioned. Any invariance approach, no matter whether you apply a neural network or whatever has to be researched regarding its invariance properties, that is, how an approach is able to separate (or to distinguish) different patterns: Seminal work which describes inavariance properties was published by Emmy Noether at the beginning of the 20th century.
Many publication on the a/m topic was published by Hans Burkardt / Un Freiburg (now Emeritus) et. al. Many of these papers are available online. One of his PhD strudents published some interesting approaches:
You can use the property of shift invariance of the Fourier power spectrum. If you compute FFT of the image and take the abs of all the resulting complex values (or - equivalently - the square of the abs), then you've got shift-invariant representation (ignoring the wrapping at the borders).
You may also do the log-polar mapping trick to get robust against changes of scale and rotation (see Fourier-Mellin transform).