While applying back propagation for training neural network, how the weights are updated after finding the error between the desired and current output? Will the weights be replaced completely or will they be changed with some factor?
I feel it is beneficial to clearly distinguish backpropagation and optimization methods:
Backpropagation (backward propagation of errors) is a method which efficiently computes gradients of compex functions, such as multi-layer neural networks.
The computed gradients can be used in a number of optimisation methods. What Ingo described is gradient descent (in batch mode) or stochastic gradient descent (updates on small subsets of training data).
I found this paper from Andrew Ng group to be understandable and to give good insight into the behaviour of classic optimisation methods:
Quoc V. Le et al.: On Optimization Methods for Deep Learning. ICML 2011.
Anyway, stochastic gradient descent with a reasonable momentum tends to be the method of choice for large models on large data.
Krizhevsky, Sutskever, Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
I like the UFDL tutorials http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial which explain backpropagation and optimization in an accesible form.
The backpropagation algorithm computes a modifier, which is added to the current weight. This equation includes a constant learning modifier (\gamma), which specifies the step size for learning. I.e., new weights (\omega_ij) are computed as follows:
\omega_ij = \omega_ij - (\gamma * o_i * \delta_j)
where o_i is the stored output in node i and \delta_j is the backpropagated error at node j.
Thus, if your algorithm computed the modifier (\delta \omega_ij = - (\gamma * o_i * \delta_j)) you have to add it to the weight ...
Due to the generality of your question, I would suggest you to read some basic introduction to neural network training to have a grasp of how backpropagation works. As an example, I found this ebook by David Kriesel a simple (yet satisfying) reading:
Generally speaking, as the previous answer pointed out, classical backpropagation updates the weights by adding a term which is proportional to the gradient of the error function (more or less). However, there are many (many) variants to this, which results in different update rules. For example, in the RProp algorithm the magnitude of the computed gradient does not influence the weight change (i.e., each modification adds always the same constant value). There are also algorithms where weights are computed in a single step (e.g. RBF networks, Extreme Learning Machines, etc.).
I feel it is beneficial to clearly distinguish backpropagation and optimization methods:
Backpropagation (backward propagation of errors) is a method which efficiently computes gradients of compex functions, such as multi-layer neural networks.
The computed gradients can be used in a number of optimisation methods. What Ingo described is gradient descent (in batch mode) or stochastic gradient descent (updates on small subsets of training data).
I found this paper from Andrew Ng group to be understandable and to give good insight into the behaviour of classic optimisation methods:
Quoc V. Le et al.: On Optimization Methods for Deep Learning. ICML 2011.
Anyway, stochastic gradient descent with a reasonable momentum tends to be the method of choice for large models on large data.
Krizhevsky, Sutskever, Hinton: ImageNet Classification with Deep Convolutional Neural Networks. NIPS 2012.
I like the UFDL tutorials http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial which explain backpropagation and optimization in an accesible form.
Updating the Weights is the most crucial part of neural training that influences the quality of learning and classification efficiency . For non linear multi-dimensional input, each neuron creates a hyper plane that separates its class from remaining classes. The ratio of these weights determine the slope of the hyper plane.
The beauty of back propagation is any kind of non linear problem can be modeled in multi layer neural net using this algorithm where the activation function chosen must be non linear(obviously) and differentiable. Ex : sigmoid, tanh, etc
If you choose sigmoid(S) as activation function then its derivative would be S(1-S)
New weight = old weight + delta where
delta_jk = eta*error_j*Z_k*S_j*(1-S_j) where Z_k is inputs to that node
Technically as author pointed out, we are updating the weights based on delta change that is directly proportional to gradient descent of the Error w.r.t its weights
For more clarity, Refer to the attached piece of code i wrote for neural net weight update .