The universal approximation theorem states that: " For any continuous bounded function with compact domain X and any threshold ε we can find a neural network N with a single hidden layer that gives an approximation of the function to the specific threshold ε. "
But, we also know that Taylor and Fourier series expansions can be used to achieve the same purpose. So why are deep neural networks seem to be considered more efficient on many tasks. For instance, the face recognition task.
The universal approximation theorem seems general and doesn't say a lot about the nature of the function to approximate. Is there a mathematical explanation for such success of deep learning?