Technically, the gamma parameter is the inverse of the standard deviation of the RBF kernel (Gaussian function), which is used as similarity measure between two points. Intuitively, a small gamma value define a Gaussian function with a large variance. In this case, two points can be considered similar even if are far from each other. In the other hand, a large gamma value means define a Gaussian function with a small variance and in this case, two points are considered similar just if they are close to each other.
Regarding, the tuning the parameters. I don't see any problem in your methodology. I'd use the grid search to find the C, gamma and epsilon values as well.
Technically, the gamma parameter is the inverse of the standard deviation of the RBF kernel (Gaussian function), which is used as similarity measure between two points. Intuitively, a small gamma value define a Gaussian function with a large variance. In this case, two points can be considered similar even if are far from each other. In the other hand, a large gamma value means define a Gaussian function with a small variance and in this case, two points are considered similar just if they are close to each other.
Regarding, the tuning the parameters. I don't see any problem in your methodology. I'd use the grid search to find the C, gamma and epsilon values as well.
The gamma parameter defines how far the influence of a single training example reaches, with low values meaning ‘far’ and high values meaning ‘close’. The gamma parameters can be seen as the inverse of the radius of influence of samples selected by the model as support vectors. If gamma is too large, the radius of the area of influence of the support vectors only includes the support vector itself and no amount of regularization with C will be able to prevent overfitting.
When gamma is very small, the model is too constrained and cannot capture the complexity or “shape” of the data. The region of influence of any selected support vector would include the whole training set. The resulting model will behave similarly to a linear model with a set of hyperplanes that separate the centers of high density of any pair of two classes (http://scikit-learn.org/stable/auto_examples/svm/plot_rbf_parameters.html)
Section 3.2 of attached report (,please refer to attached PDF file describes cross-validation and grid search approach for choosing parameters of RBF kernel.
More info can be found at professor Chih-Jen Lin's Home Page. There are many publications including the attached ones as well.
Allan's response is accurate. What I can add is that, conceptually, the narrower the RBF kernels get (larger gammas) the more "spiky" your hypersurface is going to get, i.e., you would have a hypersurface that is close to zero everywhere except that you would have spikes where the data points are (assuming that we are talking about SVM here). On the other hand, if your RBF kernels are too wide (small gammas), you would end up with a hypersurface that is almost flat. Both cases would lead to poor performance on the validation set and any data point that you would get in the future. Hence, the distribution of the data can guide you in choosing an initial range for the gamma parameter. Moreover, normalizing the variables might help you get more homogeneous distributions along different dimensions.
Dear farzin i've used radial basis functions in meshless methods.The EXP shape parameter controls the decay rate of the function and i found out that the smaller the shape parameter, the smaller the estimated error for curve fitting.