"How can unsupervised pretraining of deep neural networks using massive amounts of unlabeled data improve the accuracy and efficiency of supervised learning for specific tasks?"
Using unlabeled data to reinforce the training model makes it conform to some base assumptions, such as the smoothness assumption, the cluster assumption, and the manifold assumption. That is, the decision boundary must be located in the low-density region.
Unsupervised pre-training initializes a discriminative neural net from one which was trained using an unsupervised criterion, such as a deep belief network or a deep autoencoder. This method can sometimes help with both the optimization and the overfitting issues.