10 October 2017 2 7K Report

Dear RGs,

During model training, we can dump both loss, the range of the weights, etc. One indicator of interest is the L2 norm of the gradient.

like its maximal/minimal value, its distribution and so on so forth.

My question is, say I find 0.5% of the variables get L2 norm of gradients less than 1e-3 during iteration in an epoch, can I claim the model is not learning efficiently. One more phenomena I get is the vibration of the accuracy instead of ascending among iteration, my batch size is about 1/5 of the train set.

any hint and appreciate beforehand

Lu

Similar questions and discussions