This is one of the many reasons why I love kernel models!
Kernel models are exactly the same as linear ones, except they first transform the data. Now, the math shows that we're transforming into an even bigger space, so if you're inputs have 1,000 features (dimensions), the kernel space could be 100,000 or even infinite.
But, sweep away that math, and all we are really doing is calculating the distance between each data point. The kernel machine then uses those distances as input.
So if you have 10 data points, you have 45 distances between them. It doesn't matter if your data has 5 dimensions or 1,000, you still have only 45 distances.
The trade off, though, is the number of distances increases dramatically as the number of points increase. I'm sure you know that the number of distances between n points is a triangle number: n*(n-1)/2
MLP tries to reliably estimate huge number of parameters (the number depends also on the number of inputs, i.e. features) in order to build a 'decision model'.
In contrast to that, SVMs with kernels do not operate on the source feature space explicitly. Instances from the training set appear only as arguments of the kernel function. In such a way the dimensionality of the feature space is always 'behind the scene'. Moreover, usage of kernels often increase the dimensionality (virtually) up to the infinite-dimensional space (in case of RBF kernel).
This is one of the many reasons why I love kernel models!
Kernel models are exactly the same as linear ones, except they first transform the data. Now, the math shows that we're transforming into an even bigger space, so if you're inputs have 1,000 features (dimensions), the kernel space could be 100,000 or even infinite.
But, sweep away that math, and all we are really doing is calculating the distance between each data point. The kernel machine then uses those distances as input.
So if you have 10 data points, you have 45 distances between them. It doesn't matter if your data has 5 dimensions or 1,000, you still have only 45 distances.
The trade off, though, is the number of distances increases dramatically as the number of points increase. I'm sure you know that the number of distances between n points is a triangle number: n*(n-1)/2
The reason is that SVM optimization works on the norm of the vectors and not on the dimensions directly. Hence, if there are many zero feature values, the Euklidian length, i.e. the norm, is much less than the number of dimensions. For texts, this is clearly true, since most of the words in a vocabulary don't occur in a specific text. In other applications, e.g., genetic data, however, this is not true, because there is a non-zero for most of the dimensions and then SVM also suffers from the high dimensionality.
In addition to above answers, My intuition says, when compared to a low dimensional space, In a high dimensional space the number of possible decision boundaries that can be drawn to separate classes will be high, so the amount of effort needed in optimization is much less(because of multiple possible set of parameter values) and this works as an advantage to SVM.