Gradient-based methods are not considered heuristics or metaheuristics.
Wikipedia points out that "In computer science, artificial intelligence, and mathematical optimization, a heuristic (from Greek εὑρίσκω "I find, discover") is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution." Metaheuristics "are higher-level procedures or heuristic designs to find, generate, or select a heuristic (partial search algorithm) that may provide a sufficiently good solution to an optimization problem, especially with incomplete or imperfect information or limited computation capacity."
Heuristics can use deterministic or stochastic rules to avoid the problems of classic methods, such as the zigzagging behavior toward the minimum of the gradient descent method.
Metaheuristics commonly use stochastic heuristics to solve a wide variety of complex problems, and they usually simulate intelligent processes and behaviors observed in nature and other disciplines.
While reading on the topic on meta-heuristics I found that not all meta-heuristic methods are stochastic (e.g. Tabu Search), so the difference between heuristics and meta-heuristics is not the use of randomness.
I understand you well Rafael Rivera-Lopez , gradient-based methods are only heuristic methods, not meta-heuristic methods because there are classic methods that can be slow. But we could argue that meta-heuristic methods are also slow (e.g. genetic algorithms). And we could also argue that gradient-based optimisers are inspired by higher-level optimisation principles: iterative local search that consider local topology of objective/constraint functions. I am still confused.
Gradient-based methods in unconstrained optimisation are based on the stationarity condition, that says that a local minimum of a differentiable function (called f, for clarity) necessarily must be a stationary point (i.e., the gradient of f at that point equals the zero vector.
The most natural tool - if we do not know much at all - is to calculate the gradient of f at the starting point - call it x_0 -, and let the first direction therefore be the negative of that gradient in R^n. The first iteration will be done, once you try to minimise the value of f (x_0 - lambda_0 * gradient f (x_0)) with respect to the positive number lambda. There are plenty of other inexact line searches - such as the Armijo rule, and since we need to do this several times over and over, there is no point in trying to find an exact value for the minimiser lambda_0.
Just repeat this, now from x_1, which is defined as "x_1 - lambda_1 * gradient f (x_1), and so on, from x_2.
It's one of the first attempts to describe descent methods in nonlinear optimisation. It's definitely not an ideal method, as it only uses the first derivative; Newton, or quasi-Newton, methods are rather a lot better to use. But that you can read anywhere, such as here:
Between Non-lineal Programming methods a group of gradient-based methods exists. Also exist gradient based heuristic methods.
Optimization techniques includes methods for models that have fixed mathematical structures such as Lineal, Quadratic, Geometric and Integer Programming. Also exist methods that could be applied to tasks which objective and restricted functions satisfies some conditions. In this case could be applied methods based on KuHn Tucker theorem.
For discrete nature tasks the Branch and Bound method u frequently used. For this some requirements exist for the determination of some procedures for branching and some algorithm for doing bounded values of the objective function.
Also exist the class of Non - lineal and Stochastic Programming methods. Stochastic Programming methods is has a very wide set of possible solutions metohods. One of it variants consists of solving the problem as a Non-Lineal Programming task doing internal Monte Carlo experiments in each iteration is such a way of obtaining the stochastics parametrs of eah intermediate solution, with the particulariety that calculation procedures could be used in quality of mathematical model.
Heuristics are a powerful meatn for solving such tasks for wich don´t exist a good exact solution method. There are also heuristic gradient based heuristics. but this is an special theme. If you are interested we could interact in this problem.
I begun from a brief clasification of optimization methods for clarifying the problem, by the reason gradient based fall into Non-linear Programming methods, but the same problems could be solved also by heuristics
I do not see the point of doing that - when we are talking about nonlinear optimisation. It does not make it clearer at all - instead you confuse. Stick to the topic of the question instead.
As mentioned by professor Michael, gradient-based optimization algorithms work based on the stationary points and the relevant information of the derivatives/gradient. Some examples are the newton method, quasi newton method. These methods fall under deterministic optimization techniques but do not falls under heuristic optimization methods.
By contrast GA, PSO, DE are called evolutionary algorithms or metaheuristics. These algorithms are stochastic by nature and they work by generating random populations through the search space and evolve themselves according to the cost functions to generate better solutions. The term heuristic means there is no guarantee that you will always find the global optima. But, you can find a near-global optimal solution.
Typically not even a near-optimum, Rasel, in a large-scale case - which is really the only case that is of interest these days.
The meta stuff is clearly over-represented at RG - much too much is discussed about it, and it is un-deserved, as there is very little theory behind it, and the practice is sub-par, too - at least in almost every paper I have been able to see the battle among mathematical optimisation and meta stuff.
Mathematical optimisation is the way to go, almost always, and mathematical optimisation is still under development. I just don't fathom why there is so much written about meta stuff, when mathematical optimisation offers so much more!
I know that part of it is due to the fact that most of those who work on it has a very primitive knowledge in mathematical optimisation - it is very sparse and primitive - but there is every chance to learn it. Buy a book on mathematical optimisation, check out all the tutorials that you can find on the web, and make a comparison. Come back in a Month and let me know what you have learnt. :-)