GPU's are not required for for building Deep Learning models.
Deep Learning models are distributed models of computation, meaning that each neurons processing can be compartmentalized and its computation is rather simple in nature. GPU's being a SIMD(single instruction multiple data) parallel paradigm is very much amenable to this type of architecture (most neuron computations can be reduced to single instruction blocks to be carried out in GPU kernels).
Also, due to the GPU's lower clock cycles and efficiency in deploying hundreds of threads per cycles, give a more efficient and economic method of deploying distributed architectures.
Note: Careful design needs to be implemented in GPU's so that the programming does not break parallelism. Forks due to decisional processes can represent bottlenecks in parallel execution.
When we train a deep learning model, there are many matrix multiplications involved. Further, these multiplications lead to computational problems. As we perform all matrix multipications at the same time to make the training faster, we need GPU. GPU is also need to handle high dimensional matrices found in deep learning models.