What is the difference between test set and training set for label matrix in svm?

Francesco Bianconi Popular answer

In general we have:

Train set -> to build the model (required)

Validation set -> to optimise the model (optional)

Test set -> to test the model, i.e.: to evaluate the model's accuracy (required)

The type of classifier doesn't matter

Robert Fischer

A Practical Guide to Support Vector Classification

http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

How can I use Libsvm in matlab for multi-class SVM?

https://www.researchgate.net/post/How_can_I_use_Libsvm_in_matlab_for_multi-class_SVM

I hope that helps.

James Walter Taylor

This question was confusing to me, in that I was not aware that "test set" and "training set" might take on a different meaning, depending on the classification technology. If this is the case, I'd certainly like to generate some discussion.

As I think of it:

"test set": A set of classification patterns including inputs and ideal outputs used *only* to measure the performance of a classification system; individual input patterns are never used to change or update the behavior of the classifier.

"training set": A set of classification patterns including all inputs and perhaps the associated ideal outputs, used individually to update the behavior of a classification system, and used collectively to assess the performance of the system.

There are several complications and quibbles buried here, the differences between observations and measurements, static classifiers versus those with memory (eg. Markov or adaptive), supervised versus unsupervised, time domain versus spacial, and so on. The principle of using a training vs. test set performance comparison remains one means to verify that the classifier is not over specific to the training set.

The conversation ought to continue on to the means to identify that appropriate sampling has occurred in constructing each set, and that both sets are sufficiently dissimilar that this is an appropriate check against overtraining. While one can argue - and regulatory groups like the FDA do argue - that this is a necessary test; I would argue that this is not sufficient. If one is classifying ECG, for instance, from a set of measurements, it is difficult to imagine a test set large enough to cover the combinatorial range produced by heart rate variations, QRS widths, QSRT morphologies and arrhythmias. I think other testing is possible and maybe necessary to assure appropriate generalization.

Robert Fischer

To the best of my knowledge I never heard or read about a different meaning of "training set" and "test set" regarding different classification approaches. Thats why I assumed the question to be of practical nature. So I pointed out some ressources how to get a quick start in Matlab.

Everything you wrote is of course correct, just for the sake of a bit more generalization:

In general we can comprehend a classifier as a function γ:X→C. The function tries to determine the appropriate class C for a given sample X. A sample is represented by a n-dimensional feature vector x=(x_1,x_2,x_3,…,x_n)∈X, depending on the concrete problem we have a set of n classes C={c_1,c_2,…,c_n}.

The fomalization of a training sample would be 〈t,c〉∈X×C, which means each sample of the training set is annotated with it's appropriate class. A training set in general is a set of annotated samples T_Training={〈t_1,c〉,〈t_2,c〉,…,〈t_n,c〉 } | 〈t,c〉∈X×C, in contrast to that the test set contains only samples without annotations T_Test={t_1,t_2,…,t_n } | t∈X.

During training stage the system is trained using the annotated samples from the training set which results in a set of rules also called model. Depending on the quality of the model the system should now be able to determine the class for a non-annotated unknown test sample from the test set.

--- Correction 29.07.2014 ---

After reading the answers of Francesco Bianconi and M. Ramakrishna Murty I realized I made a mistake in my previous post. Thanks for that ...

Of course not only the 'training set' but also the 'test set' is annotated / labeled. Better to distinguish between 'labeled' and 'non labeled' data.

Whereat, the labeled data is split into 'training set', 'validation set' and 'test set' for use during training phase: T_Labeled = { 〈t_1,c〉,〈t_2,c〉,…,〈t_n,c〉 } | 〈t,c〉∈X×C

And the non labeled data which represents all unkown samples during working phase: T_NonLabeled = { t_1,t_2,…,t_n } | t∈X

M. Ramakrishna Murty

In the classification method first we construct model and then test your model either model is correctly working or not.

simply

-> Data set which is used to construction of a model for classification is called training data set.

-> Data set which is used to test your model is called test data

Francesco Bianconi

In general we have:

Train set -> to build the model (required)

Validation set -> to optimise the model (optional)

Test set -> to test the model, i.e.: to evaluate the model's accuracy (required)

The type of classifier doesn't matter

Robert Fischer

After reading the answers of Francesco Bianconi and M. Ramakrishna Murty I realized I made a mistake in my previous post. Thanks for that ...

Of course not only the 'training set' but also the 'test set' is annotated / labeled. Better to distinguish between 'labeled' and 'non labeled' data.

Whereat, the labeled data is split into 'training set', 'validation set' and 'test set' for use during training phase: T_Labeled = { 〈t_1,c〉,〈t_2,c〉,…,〈t_n,c〉 } | 〈t,c〉∈X×C

And the non labeled data which represents all unkown samples during working phase: T_NonLabeled = { t_1,t_2,…,t_n } | t∈X

Sorry for the confusion ...

How to interpret the results of PSNR and MSE for the quality evaluation of images?

What are the methods used for quantitative/Qualitative analysis in medical/digital image processing?

Can Random Walker be compared to Support vector machine?

Can someone guide me about the interpretation of Relevance Vector Machine results in matlab? And how can i plot a classifying output from the result ?

How to write an introduction in a computer vision research article?

How to write experimental results in scientific way?

What are the current challenges or under research topics in medical image segmentation?

What is meant by Random Walker algorithm in the context of image processing?

What should be the xlabel and ylabel for svm plot?

Any new ideas about T.B disease detection in lungs using image processing/computer vision?

Feedback defines the constitution of an organism?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

What precautions should be taken while handling S. aureus enterotoxin Type B in the lab?

How to understand this crystallographic phenomenon of low temperature crystals in zeolite?

Measuring the Intelligence of a Species?

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

The Curse of Evolution and Complexity?

Need help with my research project on open source SIEM and machine learning?

Swimming/space travel depends on the proprioceptive muscle spindles?

How to start a Molecular Dynamics Simulation?