How to select the optimum values for the number of batches, number of epochs, number of hidden layers, and number of steps for classification using the deep learning algorithm?
This purely depends upon the type of model you want to build and by looking upon the various graphs you can decide upon these hyperparameters. It is well explained in this one single link.
There is no general rule for it. To select the optimum number of hyperparameters and network architecture, several different networks should be initially trained on a small portion of the data. Then compare the accuracy of all networks. The network with the maximum accuracy has the best architecture. Then, you should apply the selected architecture on the whole data set. The network can be further tuned by dropout regularization. Regarding the number of epochs, the best way is to assign a large number of epochs (e.g 1000) and then use early stop regularization. This technique prevents over-fitting by stopping the training procedure, once the model performance on the validation subset does not improve for a certain number of epochs.
Since Stochastic Gradient descent is the default optimization technique, ideally you would like to make your batch size as large as you can. This, given that the only reason we separate in batches is for computational memory.
Grid search used to be the go to technique when selecting hyperparameters. However, it has been proven that Random search in your domain space is highly more efficient. http://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf
You can also study the effect of each of the hyperparameters through a Design of Experiments. Article Design of Experiments and Response Surface Methodology to Tu...
In my experience, when you have a small number of features, usually many layers with few units work best. And when you have many features, use few layers with many units. Obviously this depends on the problem and the data you have, but it my give you a head start.
Thank you all for responding and sharing useful links. I also did optimizations on each of any single aforementioned parameters by adding a simple loop and testing different numbers for each parameter. For example, in the attached figure, I calculated the error rate considering the different number of epochs, and I finally selected the value which resulted in the minimum error rate.