Algorithms such as Backpropagation are basically gradient descent optimization algorithms which minimize an error-cost function. Thus, these algorithms tend to converge to the local optimum of the attraction basin where the weights are initialized and may have difficulties for finding the global optimum on highly multi-modal cost functions. The idea of using derivative-free heuristics such as GA is to provide a more global optimization approach when optimizing the cost function.
Algorithms such as Backpropagation are basically gradient descent optimization algorithms which minimize an error-cost function. Thus, these algorithms tend to converge to the local optimum of the attraction basin where the weights are initialized and may have difficulties for finding the global optimum on highly multi-modal cost functions. The idea of using derivative-free heuristics such as GA is to provide a more global optimization approach when optimizing the cost function.
In ANN, we have some input values and respective output values. Neural networks start with some initial weights and find the respective estimated output. Clearly there will be a difference between actual output and estimated output. This represents the error function. Backpropagation algorithm is based on the gradient of this function and modifies the weights accordingly. But for a complex network with large number of input, hidden or output nodes, this function becomes quite complex. So, it becomes difficult for the algorithm to converge to the optimum weights.
GAs and all other heuristic algorithms are designed for such problems. Here also the weights are initialized for each input and hidden node, but not one. Instead, a population of weights are initialized. For each member of the population, the error function is evaluated. The algorithm tries to find the optimum weights by using its selection, crossover and mutation operators.