As in - now we can finally talk about guaranteed convergent descent methods.
You see - I have had the hope that someone could see the light, and instead of using poor methods (i.e., metaheuristics) for a standard nonlinear program to actually applying convergent methods, some of which were known already in the 50s and 60s. Among them is steepest descent, using a line search, and more than 100 variations on that. It works with or without constraints.
The idea from steepest descent is the use of an approximate problem, derived from a Taylor expansion, right? So Newton's method with line search is a faster one, given that you can compute second derivatives. And there are plenty of such variations.
If you show me the problem you have I can give you hints. Or even go to Wikipedia and search for steepest descent and Newton's method.
And no, this is not a special metaheuristic. Steepest descent and most variations of Newton's method converge to an optimum for convex problems, which metaheuristics NEVER will.
Thanks and thanks. Yes, i enjoy deliberating with you.
I have an idea of newtons step. I have implemented and coded it before.
What about cases where your optimization problem is not convex? How do you apply steepest descent and newtons step then?
I have attached my problem's sheet. Should not be too difficult to locate the optimization problem.
By limiting the coefficient set, i could reduce the residual from e-7 to e-10. It took more than 300 iterations or reruns. It may not sound significant to you, but in the process the coefficient set improved by more than a factor of 2. And this is from an inferior initial solution.
I am still determining and testing how far this approach can 'walk' and whether this approach can walk right up to the right answerm meaning the initial solution is less critical, though helpful.
My problem too is that a bad initial solution can have a good residual. It looks like this is something this approach can help work around.
I would appreciate suggestions as to how to improve it or speed it up
So the problem is probably non-convex in some regions. There are tricks for this. One very simple one is - if you are using 2nd-order methods (Newton) you can - without violating convergence - add a k*unit matrix (i.e., the matrix that you get from the square matrix with the values k on the diagonal), and zeros elsewhere, and then increase the value of k so that when you solve for the Newton direction, you will guarantee that the direction is a descent direction. There are very many tricks like this.
I think i asked this in the previous post, but can the method you suggest - your method in short - get out of valleys? Or is the objective or goal not to get in valleys in the first place?
Also, can you outline the basic steps or logic of your method in brief, and in simple language. The 1, 2, 3... if you can, please dont (only) refer to a paper or book, because these hardly note the simple steps in common language.
I can read the paper or book after i understand the basic principle of operation
We do want to end up at the bottom of a valley, but not just any bottom, but the best one. That's why it might make sense to re-start your method at very many starting points - gitter points in the thousands. You will not know if you have found the global optimum, but the more gitter points the more likely it will be a very good point. And - let's face it - if we are 2% off the best then we may not need to care. (Of course we will not KNOW for sure, but the more gitter points the more likely it is that have found a near-optimal one.)
If you know roughly what the feasible set looks like you can get a set of gitter points "filling" the set. Calculate the objective function at those points, and then do a re-start of your method at some of those that were the best, but also some of those that were the worst. You do want to go down in valleys, but there may be many of those, that's why a re-start elsewhere is needed.
I have located the source or origin of the non linearity of my problem. Or, i at least understand it better.
I will have to run nested optimizations, or optimizations at 2 levels, or optimization subproblems, in order to overcome it. However you want to call it.
It is similar to your gitter points, but something that must be done manually.
I am still trying to determine how far or advanced or extensive an optimization result at the top level must be, for the optimization at the next level to latch onto it, and run with it - how much non linearity must be removed at the 1st level, for the 2nd level not to trip and to succeed.
Take shorter steps in the descent directions, perhaps? Your problem is a sort-of general problem in nonlinear optimization; the bug for someone like me is that I do not know what the problem looks like.