The best approach to hierarchical scalability of deep learning to maximize prediction accuracy (1)in the absence of large data, (2) and in the presence of large data.
Earlier versions of neural networks such as the first perceptrons were shallow, composed of one input and one output layer, and at most one hidden layer in between. More than three layers (including input and output) qualifies as “deep” learning.
There is no such parameters for a neural network to be considered as deep neural network. The most trivial one consists of three, viz. Input, hidden and output layer.
Building on Shafagat Mahmudova answer, shallow neural networks have at most 3 layers (input, hidden layer, and output layer). For complex problems, the hidden layer was "fat", having many neurons. This was because more layers were not computationally feasible. Deep neural networks appeared when computers became more powerful.They added more layers instead of making the one hidden layer fatter.
It is important to note that there is not one size fit all approach with DNN to maximize prediction accuracy. By trial and error, or by the use of optimization techniques (like genetic algorithms (GA)), you can identify the correct meta-parameters of your DNN model.