I'm using the steepest decent method to find a solution for optimization problem. Can someone please guide me to the best method to find the step size to speed up the convergence rate?
The school book says - use the combination of the Wolfe condition and the Armijo rule. The Wolfe condition accepts only large enough steps, while the Armijo rule bounds the step from above. As the world of unconstrained optimization problems is so vast, there is probably no rule in this world that will work for all problems, and there are also parameters within these rules that need to be fine-tuned. One way of starting is to solve a simple strictly convex quadratic approximation of the original problem over the line you are searching (a kind of cheap (quasi-)Newton step in the line search problem), followed by either a step forward or backward according to the two criteria mentioned above.
Yuan Yaxiang wrote a nice paper in 1999 called "Step-sizes for the gradient method" published in AMS/IP Studies in Advanced Mathematics 42 (2): 785, which you should find to be very helpful. The pre-print is available on-line here:
In the attached article, two gradient methods are proposed to choose a small step-size or a large one at every iteration. The small step-size serves to induce a favorable descent direction for the next iteration, while the large one serves to produce a sufficient reduction
Zhou, B., Gao, L., & Dai, Y. H. (2006). Gradient methods with adaptive step-sizes. Computational Optimization and Applications, 35(1), 69-86.
As mentioned already by others, there are several choices. (First of all, I wonder
why you are using the gradient method at all. If you are able to compute a gradient to sufficient precision, then there are definitively better choices, e.g. BFGS, limited memory BFGS, CG ). But well: what you should _not_ do is to use the first local minimizer (for which the theoretical convergence rate ((cond(H)-1)/(cond(H)+1))^2H=Hessian, (for the function values) ) is known. Why? Once ''in the bottom'' of the a speed curved valley you will be forced to follow this valley, hence to use small steps.
A method which may be useful in the large is the Barzilai-Borwein descent, which uses one back value to compute the stepsize. (google for it)