This is very challenging task. This totally depending on the data you will feed to deep network. More it depends on the number of layers your network has, size of each layers, learning rate, number of iterations etc.
Actually, its very challenging issue . This totally depending on the data of your problem. also on the number of layers your network has been planning, size of each layers, learning rate, number of iterations etc. But Deep Learning has revolutionised Pattern Recognition and Machine Learning. It is about credit assignment in adaptive systems with long chains of potentially causal links between actions and consequences. The difficulty of a problem may have little to do with its depth. Some NNs can quickly learn to solve certain deep but simple problems through random weight guessing. In general, however, finding an NN that precisely models a given training set (of input patterns and corresponding labels) is an NP-complete problem, also in the case of deep.
There are different approaches to improve the generalization capacity of your DNN, but they mostly rely on decreasing the capacity of your model (in VC sense) without increasing the empirical risk. To do so, one have to find the model whose inductive bias better matches the data. Having a fixed DNN, you can automatically "cut" some synaptic weights during the training by using L1 regularization (LASSO), available in all the DL libraries, or even using maximum margin training for DNN. In this case you can use Eigenvalue Decay regularization and the categorical Hinge loss function. See these papers: Article Deep Learning with Eigenvalue Decay Regularizer
Article Eigenvalue decay: A new method for neural network regularization
Since your question is about drop-out regularization, there are multiple schools of thought. One is that you look at the neurons and kill those that have small outputs or inputs thus simplifying your network. You can also choose to randomly kill some neurons. Once again a simplification of the network. How you implement this depends on whether you are writing the code yourself or using a framework. It also depends on what you are doing in each layer (e.g., logistic regression regularization is often done by adding a simple lambda / (2n) * Frobenius norm). Here "n" is the number of input parameters. This is considered L2 regularization.
I note that I need to read the papers that Oswald recommended and evaluate the technique on some of my problems. Just know that there are many ways to perform regularization (drop-out, L1/L2, various decay approaches, etc). Search here or scholar.google.com if you don't have a research library -- the search phrase "Deep Learning Regularization". Good luck with your analysis.