Optimization is the study of the methods, procedures and algorithms to find the minimizers or maximizers of a function. These points are basically the solutions of the first derivatives or the Jacobians of the functions equated to zero. Their further classifications need the study of the second derivatives or the Hessian of the function. So these Jacobians and Hessians require the functions to be differentiable once, twice or the case may be. Their independent studies fall into the categories of differentiabilty. On the other hand there is one whole class of functions which are not differentiable- their studies fall into the category of non-differentiable optimiztion; but still that need derivative-like concepts like sub-differentials, sub-gradients etc.,.
Simply the optimization and differentiability are two faces of a single coin; differentiabilty without its applications in the optimization is almost useless and optimization without the use of differentiabilty is incomplete.
Suppose you have an optimization problem --- for example find the value of x that corresponds to the (local) minimum of a function. What would that minimum point look like? Well, if the function is (twice) differentiable, you get some nice guarantees... and you can be sure that the minimum has a characteristic: to the left of x, the point we are trying to find, the function is decreasing towards the minimum, at the minimum the function has a zero slope, and to the right the slope increases again. So the sign of the first derivative changes from negative to zero to positive. In terms of the second derivative (the slope of the first derivative) that should be positive. We could therefore use the f'(x) = 0 as a first order test, solve for x (i.e. it helps us to FIND x) and then check f"(x) > 0. But, now notice what I said first --- the function is actually twice differentiable here.... if that is NOT true all the things I just said are not guaranteed. So we can use the derivatives to isolate and confirm the optimal point (or in vectors as well...)
This discussion is about finding a local minimum. If you have some other guarantees (for example convexity) the local minimum is also a global minimum. It is a useful exercise to rewrite all the above for the case of finding a maximum.
There are many aspects of the connections between differentiability and convexity. One is that a convex function f which is also differentiable has a minimum only where the derivative of f (in the case of dimension n=1) equals zero; in the case of n>1 the criterion is that the gradient of f at x must equal the zero vector.
This - and a look at the first-order Taylor expansion around x - also immediately implies that if the gradient is not equal to zero then the negative of that vector points in a direction that will decrease the value of f, at least for a small enough step taken. You have then already devised an important building block of your first algorithm for the problem of minimizing f over R^n. The next worry will be how far you should travel along the vector "- gradient of f at x" - and this is where a "line search" (trying to locate a minimum of f along that one-dimensional space) is a means to find a better vector than the one you just left.
Now the use of the negative gradient is a poor choice of a search direction, since only first-order information is used. There are then a myriad of methods that try to mimic a Newton method, in which the main building block is the solution of a system of linear equations, based on a second-order Taylor approximation of the original minimization problem centered at the current point. Quasi-Newton methods are modifications of the original Newton method, in which the nun size Hessian matrix at x is approximated by some simpler matrix that is also guaranteed to be symmetric and positive definite (which the Hessian of f need not always be). The simplest one would be the unit matrix, but then the method collapses to the steepest descent algorithm! So you need to somehow be clever and try to bring in something that has more information from the second derivatives, and those are sometimes called quasi-Newton or conjugate gradient algorithms. There are myriads of these, and it is advised to read a little about them in a basic text book in order to understand wha practice says. Simply by looking at the corresponding terminology on Wikipedia does provide some of the most basic formulas centred on what I have just explained. I hope you have use for these. Get back if there is anything unclear.