In case of machine learning kernel based methods the kernel parameters are always tuned through methods like cross-validation etc.
My question is that "why we do not find them through some algorithm like gradient descent?"
Although I am not confirm on the answer, yet is it due to non-convexity of the high-dimensional space.
If, so why it should be non convex and in what circumstances it can be convex?