SVM is designed for two-class classification problem. If the data is not linear-separable, a kernel function is used. I want to know if there is exists any method that will indicate if the data is linearly separable or not.
Visualizing the data is useful in such cases. Another option is training a linear classifiers and checking if you can get, e.g., zero errors. Then your dataset is linearly separable. Otherwise, it's nonlinearly separable.
there is no simple answer to this question :-) the modelling choices that you make depends on the application you're looking at, the amount of data, and a bunch of other things. i'll walk you through a few considerations.
firstly. it sounds like you are trying to make a decision on which kernel type to use based on the results of a test. that you want to say "if the data is linearly separable use a linear kernel, otherwise use an RBF." this line of logic is inadvisable since the test ignores how the boundary affects predictive performance on new data. and this is really at the core of machine learning so i definitely wouldn't advise taking this route. if all that you want to get 100% accuracy on training data just use kNN with k=1. but this won't likely generalise to new data.
also, it is worth bearing in mind that you won't always want to select the boundary that separates the data perfectly. you can think of a simple example in 1 dimension where data from the negative class lies between -2 and -1 on the real line, and data from the positive class lies between 1 and 2 on the real line. so, given this data it would seem reasonable to choose a decision boundary at 0, right?
now. if there is one outlier in the negative class that lies at the point 0.9, the data is still separable, but all possible decision boundaries that perfectly separate the two classes are found between the values of 0.9 and 1. in this situation selecting a boundary of 0 is still probably very reasonable. however, by selecting decision boundaries that separate the data perfectly you will select one that feels much too far to the right.
so, often you will want to trade away predictive accuracy for classification margin.
by the way, selecting kernel functions depends on other features of the dataset. if there are a great many features compared to dataset size, you should prefer linear kernels even if the data isn't linearly separable since this reduces the risk of overfitting.
so, what i advise is that you read around model selection and cross validation. lots of important references in the following link: https://en.wikipedia.org/wiki/Cross-validation_(statistics)
A couple of comments to expand on a couple of points from Hasan's good answers.
1) You can visualise multidimensional data using something like a pair plot. This visualisation technique considers pairs of all features, and creates a matrix of scatter plots of all pairs of axes. See the link below for an example. If the data are separable over one or two dimensions you will see the separation in this visualisation. However, if the separating hyperplane spans more than two dimensions, the pair plot won't necessarily reveal that the data are separable. Additionally, pair plots doesn't scale well for massive numbers of dimensions.
2) One issue with the perceptron is that with it you can only say that data are linearly separable. If the algorithm doesn't converge, you can't say that the data are not separable.
3) In my experience of experimentation on different kinds of cluster shapes/sizes, you don't tend to get desired separation if the shapes and sizes of the 'real' clusters are dissimilar. Additionally, you will need to put thought into the distance metric to use for the application, and this is nontrivial in general.
The test I would recommend is relatively simple. Learn an SVM model with a linear kernel. Set the C parameter to infinity (or a very high number). This algorithm has finite convergence time to find the optimal solution since the optimisation is quadratic. Once it converges, make predictions on the training data. Your data is separable if you get 100% accuracy.
But please note that this may not be a good solution if you want to make predictions on future data. See my first answer for more.