Conventional Gradient Descent is very slow for Deep Learning training. While investigating about alternative methods to train Deep Neural Networks faster, I came across a few algorithms like Stochastic Gradient Descent, Contrastive Divergence, Optimization Heuristics etc. I am looking for the resources to explore all such important methods to fasten up Deep Learning training and parameter optimization time.
I'd appreciate any lead about such resources and some clarity about Contrastive Divergence algorithm? Is it an approximation to Gradient Descent or is it a different algorithm altogether? Thanks.