"A desirable learning rate is low enough for the network to converge on something useful while yet being high enough to train in a reasonable length of time. Smaller learning rates necessitate more training epochs because of the fewer changes. On the other hand, larger learning rates result in faster changes."