Is it possible that accuracy for classification of testing data set is 1.00?

12 December 2016 17 1K Report

I am working to improve classification results with my algorithm. I used it for different applications. For one of them, it works 100 percent accuracy. It seems very weird. Could you give any recommendations about it?

Firoz Ahmad

accuracy assessment is partial enumeration process.

when you are telling accuracy 1 means it is replica of ground which is nor practically possible.

increase number of points and again calculate.

there is no thumb rule for calculation accuracy. some researcher take uniformly distributed 100 point some 254 point. what i will suggest take stratified sample point based on classified area of different class.

Salih Tutun

Thank you very much, I will try and increase my data...

Michael Kemmler

Hi Salih,

100% is very uncommon and seldom occurs in standard classifcation tasks.

Either your recognition problem is rather easy, your test and training data are too much alike compared to practical scenarios, or you are actually re-classifying your training data in the test step. In the latter case, 100% classification accuracy can easily result for classifiers with many "parameters" (i.e. high capacity) such as e.g. the nearest-neighbor classifier.

Best regards,

Mike

Samer Sarsam

Hi Salih,

It is possible that your classifier is overfitting the training set. Therefore, to avoid that, you need to perform your classification process under the evaluation of 10-fold corss-validation.

However, relying only on classification accuracy when evaluating certain learning method is not enough, you need to consider additional evaluation metrics such as Confusion Matrix, ROC, etc.

HTH.

Samer

Moh'd Belal Raja Al- Zoubi

In addition to performing 10-fold corss-validation, you can change the order of the data!

Sanjay Chakraborty

It may be possible that your training and test data sets are very much alike. Generally, 70% data is used for testing and remaining 30% for training your classification algorithm. I think you should recompute the results and try to calculate the average of your set of observations. 10-fold corss-validation may also be a good choice for testing.

Benson Kenduiywo

I reccommend that you use stratified random sampling to split your data into that of training and testing, and if you have a large enough sample you could do n-fold cross validation.

Jens Kober

You didn't provide any details about the algorithm or the data-set in question, hence this might not be applicable...

As Michael Kemmler pointed out, this is rather uncommon but can potentially happen in a number of scenarios. What I found often helpful is to try to visualize the data (both training and testing) as well as the decision boundary somehow. This should give you a good indication on how complicated the problem is, if the classifier is overfitting, if the testing data is too similar to the training data etc.

If the data is high-dimensional something likte t-SNE might help.

https://lvdmaaten.github.io/tsne/

James Dominic O'Shea

If you have sufficient data, try splitting it into train / validate / test sets (say 60% train, 30% validate, 10% test). After training on the training set, use the validate set. If you can get improvements with the validate set continue training with training set. When there is no more improvement, test ONCE ONLY with the test set. This avoids problems of adaptation tot he test set with repeated testing. Others have pointed out that if you have trivial data you could get 100%. Try running your data through Weka's ZeroR and OneR algorithms - if it comes out as 100% or very close, you're not dealing with an AI problem.

Finally, whether to use cross validation - I use a rule-of-thumb which I think originates from Quinlan. If you have a moderately complex problem you probably need more than 1,000 cases in your data set - if you haven't got that then definitely go for cross-validation. How many folds? 10 is the default answer, but you really need to split your data so you have enough in the training set for the classifier to learn the domain (something you have to work out for yourself and depends on the type of classifier. But for an example, in a problem where I had 1200 records I used 60-fold cross-validation.

Syafiq Kamarul Azman

Have you tried to visualize your data using PCA (assuming you have a dataset in a higher dimension than 2)? It might help you identify if there are clear separations between the data points and could help validate your models. Otherwise plotting a learning curve and checking for the bias-variance tradeoff might also help you understand your dataset better.

Deepti Tamrakar

The classification accuracy depends on application as well as database. Classification accuracy may varied for different application for a classifier. Classification accuracy also depends on features of data.

Mauricio Costa Reis

Yes. Possible, but unlikely. :-)

Leonardo Auslender

What is the accuracy in the training/validation data sets? AUROC is 100%?

James Dominic O'Shea

Check there is nothing in the input attributes that predicts the output 100%

Anjum Shaikh

This is because of the dataset you are using it. Kindly share your dataset details.

Tamoor Malik

https://www.researchgate.net/post/By_comparision_of_different_classifiers_some_gave_accuracy_100_is_that_overfitting_or_something_else

Benson Kenduiywo

Hallo Salih Tutun. I have just submitted a manuscript to a journal recently and I discovered some issues during validation and proposed an approach termed as spatial cross-validation as opposed to ordinary cross-validation. Now here are the details of two: 1) Conventional cross validation involves random selection of training and validation points: this means that the points randomly selected can occur anywhere in the image even neighbouring the points selected for validation. This means that correlation will be high between points in both training and validation and hence higher accuracy will be given i.e. this approach has been termed to give overly "optimistic accuracy". In other words the algorithm will already have prior knowledge of what patterns are in the vicinity of points in the validation set due to high correlation (spatial dependency). 2) In contrast, spatial validation encourages subdividing the image to be classified into grids/spatial blocks and then proceed to sample the data in different blocks for training and testing an algorithm. This is motivated by the goal of image classification which is to predict labels/classes/land-cover beyond where there is training data (knowledge of classes). Subsequently, we should use a validation approach considers and evaluates the goal. I hope this helps.

Badges
Science topic

Similar topics
Data Mining
Classification

Can Artificial Intelligence Improve Psychiatric Diagnosis?

Do you know any datasets about law events?

Do you know any high dimentional datasets for classification?

Are there any big difference among data mining, artificial intelligence and machine leaning?

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

List of journals impact factors?

How can I begin quantum computing on my computer or laptop?

Where can I find a reliable(peer reviewed) source code for the QKD BB84 protocol?

How does the application of (GANs) for data augmentation impact the robustness and accuracy of image classification models?

How can attention mechanisms be integrated with convolutional neural networks to enhance performance in image classification tasks?

How spectral bands and indices like (NDVI, NDBI) together used as input before supervised classification? In ArcGIS pro or any other software?

Multi-Task Learning Architecture for Inductive Learning ability ?

Video annotation tool for action classification?

How are surrogates integrated in evolutionary algorithms?