This is very challenging task. This totally depending on the data you will feed to deep network. More it depends on the number of layers your network has, size of each layers, learning rate, number of iterations etc.
Actually, its very challenging issue . This totally depending on the data of your problem. also on the number of layers your network has been planning, size of each layers, learning rate, number of iterations etc. But Deep Learning has revolutionised Pattern Recognition and Machine Learning. It is about credit assignment in adaptive systems with long chains of potentially causal links between actions and consequences. The difficulty of a problem may have little to do with its depth. Some NNs can quickly learn to solve certain deep but simple problems through random weight guessing. In general, however, finding an NN that precisely models a given training set (of input patterns and corresponding labels) is an NP-complete problem, also in the case of deep.
There are different approaches to improve the generalization capacity of your DNN, but they mostly rely on decreasing the capacity of your model (in VC sense) without increasing the empirical risk. To do so, one have to find the model whose inductive bias better matches the data. Having a fixed DNN, you can automatically "cut" some synaptic weights during the training by using L1 regularization (LASSO), available in all the DL libraries, or even using maximum margin training for DNN. In this case you can use Eigenvalue Decay regularization and the categorical Hinge loss function. See these papers: Article Deep Learning with Eigenvalue Decay Regularizer
Article Eigenvalue decay: A new method for neural network regularization
Since your question is about drop-out regularization, there are multiple schools of thought. One is that you look at the neurons and kill those that have small outputs or inputs thus simplifying your network. You can also choose to randomly kill some neurons. Once again a simplification of the network. How you implement this depends on whether you are writing the code yourself or using a framework. It also depends on what you are doing in each layer (e.g., logistic regression regularization is often done by adding a simple lambda / (2n) * Frobenius norm). Here "n" is the number of input parameters. This is considered L2 regularization.
I note that I need to read the papers that Oswald recommended and evaluate the technique on some of my problems. Just know that there are many ways to perform regularization (drop-out, L1/L2, various decay approaches, etc). Search here or scholar.google.com if you don't have a research library -- the search phrase "Deep Learning Regularization". Good luck with your analysis.
I completely agree with Kyle, dropout is another good and usual option to decrease the size (and so the complexity) of your DNN, since this method can be understood as a low-cost way of performing model averaging/bagging ( https://en.wikipedia.org/wiki/Bootstrap_aggregating ) with smaller neural networks, since it drops out units/neurons from the original DNN (while L1 regularization shrinks to zero some synaptic weights, i.e. it "cuts" some connections between neurons). Regarding the example on image classification, here is an example of convnet applied to human action recognition with ED regularization, besides some other tricks to improve the generalization capacity of the model (dropout can also be combined with ED with good results): https://github.com/oswaldoludwig/Human-Action-Recognition-with-Keras