I was implementing dropout method with conjugate gradient method of updating weights. As I was not getting good results, Hinton sir said that "conjugate gradient is no good with a stochastic method of getting gradients". Can anyone please give explanation for it? Denoising autoencoder is also a stochastic method. Is conjugate gradient method suitable for denoising autoencoder?