Types and Recommendation: There are a number of optimization algorithms that you may utilize. The one which is the "best" for you depends particularly on your project. It's always recommended to have a trial and error process to select the optimum one.
Gradient descent algorithm is the primary one from which other advanced algorithms are derived. Again, it may be Stochastic/Batch/Minibatch Gradient Descent depending on the size of your dataset.
Other more advanced algorithms include Gradient Descent with momentum, Adagrad, RMSprop, and Adam. For each of them, you need to select the optimum value of the learning rate and other relevant parameters as well. It's another trial and error process. So, I would suggest, start with anyone, record the performance and then try another.
How to use: Although at first, it's better to know how this optimization works, you may start using it without the background deep know-hows if the situation doesn't permit. I believe you already using a language like Python, R, MATLAB to implement your CNN model. Search how to use specifically for that language/package. You just need to know about the proper functions/methods and their arguments. It will be easier, don't worry much.
The optimization algorithm depends on the data characteristics of your data set and the size of the data volume, which is not easy to generalize.
Specifically, I think optimization can start from three aspects: the first is to optimize the gradient descent method, you can try a variety of gradient descent models, such as SGD, Adam, RMSprop, etc.; the second is to optimize the network structure, When you build a DCNN model, try to add more batch normalization layers and dropout layers. The effect is great, or use residual structure blocks; the third is to use the hyperparameter search method to determine your number of convolutions, learning rate, etc. optimize.
In application, some good frameworks make the above optimization only a few lines of code, such as the commonly used pyTorch and Tensorflow frameworks.
Genetic Algorithm(GA) and Simulated Annealing(SA).
GA is a random sampling method so you should define good crossover and mutation operation to get good efficiency. It is most flexible optimization techniques but not easy method than other technique if it is not well defined.
Gradient based optimization techniques:
Using 1st order
Batch Gradient Descent and Stochastic gradient descent.
Using 2nd order
Newton method, conjugate gradient and scaled conjugate gradient.
Go through this paper CNN and GA
https://ieeexplore.ieee.org/document/1330900
Otherwise you can combine both GA and SA try this also.
The most popular optimization techniques for machine learning are : Support Vector Machine, via Kernel or Clustering algorithmic tools, Principal Component Analysis, Neural Networks., Convolutions etc., with best of the as Bayesian framework of Maximum Likelihood Estimation, Variational approaches, such as Expectational Maximization methods. One of the most recent book for nonlinear data via optimization on manifolds is here: https://press.princeton.edu/absil
Dear Welemhret Welay Baraki there exist several ML based optimization techniques (CNN, R-CNN, Faster R-CNN, RNN). The choice of one algorithm among all should depend on the application.