I know that in data mining it is very important as data is processed before putting it in a classifier, I wonder if it also has a relationship with the type of kernel chosen, thanks
Which kernels go well with what normalization depends, as usual, on the data at hand. However, if you use kernels such as the exponential (rbf) kernel, normalization can be very important indeed. Since the rbf kernel is defined as k(x,y)=exp(-gamma*||x-y||^2) or similar variants thereof, each dimension of the feature vector is treated equally, since every dimension contributes similarly to the euclidean norm ||x-y||^2 = sum_i (x_i-y_i)^2.
Thus, the degree of variation (in terms of the absolute size |x_i-y_i|) of features can have a huge impact upon the result. Just imagine the first feature being uninformative for the given problem, but highly variable. An informative feature with only little amount of variation might easily suppressed by the first.
In such cases, methods such dimension-wise re-weighting or principal component projections might tremendously improve the classification result.
Also note that such dimension-wise normalization is not that important in the linear kernel, as the learning algorithms implicitly scales the dimensions on its own.
A good practice especially before applying kernels such as the RBF is data standardization which includes data centering and scaling. This family of kernels assumes that the data is centered around zero. In general you can achieve this by substracting the mean of all features. In addition, the features can be scaled to have variance 1.
I probably miss somthing, but I'd rather say that the rbf kernel does not benefit from the mean centering step, since the mean is a constant offset to each feature. This offset is effectively cancelled out when computing differences, i.e. ||(x+m)-(y+m)||^2=||x-y||^2. Hence, no benefit, but no harm done either.
As pointed out before, however, scaling might be a sensible thing to do.
Indeed, centering is not needed for distance-based kernels which are translation invariant. Thanks to Michael Kemmler for correcting. I find centering meaninful for higher degree polynomial kernels. However, I am not sure that centering around zero is always the best choice. I think it depends on the data distribution, but probably the mean should still be close to zero.