Deep learning does perform better than other mchine learning algorithms as the empirical results suggest. The reason why is unknown yet.
Some has suggested that is because it loosely mimic the brain functions, multiple layers of neural networks stacked one after another like the classical brain model. However, until now there is no robust theoretical background for deep learning
Ussually in deep learning a part of the deep network or a part of the training process is specialized in unsupervised feature extraction, so the noise is extracted from data. With "clean" data the chances of getting the real phenomena that produced the data are greater.
Here is my take on the subject. Deep Learning machines usually work better than traditional ML tools because they also learn the feature extraction part.
In image recognition, for example, the traditional setup is to extract handcrafted features and then feed a SVM. On the contrary, deep learning schemes also optimize the features that are extracted which largely explains why they perform better.
Chances are, if you fix the features (either handcrafted or learned, but not further optimize), the winning algorithm between an deep multi-layer perceptron and a kernel SVM is depending on your skills to fine tune the hyper parameters of both methods.
Performance of the DNNs has been explained in details in the very recent book of Ian Goodfellow and Yoshua Bengio and Aaron Courville "Deep Learning (Adaptive Computation and Machine Learning series)" which is available online at:
Deep learning techniques learn by creating a more abstract representation of data as the network grows deeper, as a result the model automatically extracts features and yields higher accuracy results. It not uncommon to the us the higher level features of deep learning model to be used for classification or regression as was done in the 2015 Facenet paper and the 2015 Faster R-CNN paper.
I think the best answer so far was Marwa's, we don't know. In my case, a thin deep network with two hidden layers also outperforms all traditional methods by a significant margin (2-5%) on handcrafted features. And these features were handcrafted for the traditional methods and a simple DNN still outperformed everything else like a drop-in replacement. Honestly, I was skeptical about DNN when it became a hype, but my own results convinced in a short time that it is a beginning of a new era.
Parts of the following paper discuss about "what makes a representation good, and how DL makes it". Which is why DL performs better than other traditional ML.