K fold cross validation is decreasing my accuracy?

More Talha Anwar's questions See All

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

How to generate a citation of my paper from ResearchGate?

How we can cite the papers from ResearchGate. I am trying to create citations for this article, Quantum Machine Learning Algorithms for Optimization Problems: Theory, Implementation, and...

08 August 2024 6,690 3 View

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

I am currently working on LncRNA; to know the lncRNA-protein interactions I want to do RNA pull down assay, so I need to design primers with T7 promoter. I need assistance in this regard.

07 August 2024 6,622 1 View

How to fix background error in rietveld refinement of one XRD peak using GSAS-II?

I want to refine one XRD peak of my in-situ xrd but the background is never working good which ultimately fails the refinement. How to refine and adjust the background using GSAS-II

05 August 2024 5,291 2 View

How can I add own Henry coefficients in Aspen Plus?

Hi, i would like to simulate an absorption process in Aspen Plus. I want to use the NRTL model und would like to add some individual Henry coefficients. Is that possible and how?

05 August 2024 2,333 2 View

Why might the impedance values for DI water and 0.1X PBS buffer solution exhibit a decreasing and increasing trend, respectively over time (HP 4194A)?

Hello everyone, I'm encountering an issue with my electrochemical impedance spectroscopy (EIS) measurements and would appreciate some insights. Experimental Setup: Electrodes: Gold interdigitated...

05 August 2024 3,783 2 View

Can usage of AI tools like chat GPT in research work is recommendable ?

AI tools like ChatGPT can enhance research work significantly when used responsibly and in conjunction with thorough human oversight.

05 August 2024 1,842 3 View

Usage of internal standards in LC-MS/MS analysis?

Have you ever seen a LC-MS/MS method uses both internal standards and external standards (in matrix matching purpose) but the concentrations of internal standards are outside the calibration curve...

05 August 2024 3,084 6 View

ANY free software for reconstructing neurons in the microscopic image?

Hi everyone, I am working on brain slices for visualizing a protein in the soma and dendrites, using a fluorescence tag. However, I need a tool (not paid) for reconstruction of the whole neuron,...

04 August 2024 4,725 2 View

How effective is the Citi Bloc standard basket in enhancing the accuracy and comparability of international construction cost assessments?

Citi BLOC Standard Basket Definitions: A standardized unit representing a fixed basket of construction materials, labor, and equipment costs priced in various cities. Purpose: To create a common...

04 August 2024 8,997 1 View

How combine yolo with Faster R-CNN?

I want a model that is balanced with accuracy or speed, faster rcnn has high accuracy while yolo have fast speed. i am thinking to combine them to get a hybrid model to achieve both speed and accuracy

02 August 2024 3,104 0 View

A Question about Phd thesis?

Hello everyone What is your opinion about the introduction of an expert decision support system in which the rules are extracted from existing data without human intervention, instead of being...

31 July 2024 5,785 4 View

How to prediction 2D to 3D aptamer?

Hello, I have a little problem. Well, right now I only have the DNA aptamer sequence. I went to see the 2D model form with DNA folding. After that, I used RNA Composer to simulate in 3D and used...

30 July 2024 6,794 0 View

Are these cassettes suitable for expressing PETase mutant in E. coli?

I created two potential gene expression cassettes (constitutive and inducible) for expression of a mutant PETase gene on PeptiCloud using the version tree feature, which allows users to create...

28 July 2024 7,559 1 View

Please, what is the memory consumption of the Matlab function quad tree decomposition procedure [S = qtdecomp(I)] with respect to the input set I?

27 July 2024 5,455 2 View

Is a reliability test necessary in my survey on translations?

Dear all, I gave 116 respondents 18 translated sentences and asked them to indicate their levels of acceptance of these translations on a five-point scale. Some translations result from strategies...

24 July 2024 8,245 5 View

Is it redundant to use both Random Forest and Decision Tree algorithms in the same regression project?

I am currently working on a regression model for a project and considering using both Random Forest and Decision Tree algorithms. Given that Random Forest is essentially an ensemble of Decision...

23 July 2024 4,306 3 View

How do we pick data for determination of Validation Acceptance Criteria?

Hello, colleagues! There is commenting open for new upcoming edition of USP 1033. Validation target acceptance criteria is now different from what it used to be and it doesn't include Cpm....

23 July 2024 7,292 3 View

Which research tool for expert validation for our study?

I am a 3rd year Computer Science student currently writing our Bachelor's thesis about finding diverse k-shortest paths in pedestrian networks. We have chosen 3 local areas as our proposed...

15 July 2024 4,289 0 View

What analyzes do you use to compare biodiversity in a square, at two different times?

e.g. moment one: square in its normal state moment two: square after cutting large trees and replacing them with dwarf trees.

12 July 2024 8,258 5 View

Miguel Patrício

What do you mean when you say that you get 88% accuracy using naive bayes and decision trees? Is that accuracy computed when applying the models over the same data you used to create them? If so, it is to be expected for the accuracy to be reduced when using cross validation.

a) when you test a model over the same data you used to create the model, you have overfitting (https://en.wikipedia.org/wiki/Overfitting)

b) when you use cross-validation, you expect to have less overfitting (https://tinyurl.com/yc43ae3l) but also a more realistic proxy for the accuracy

My advice is not to think of the 88% as the benchmark for the accuracy of your models, as the number reflects overfitting. K-fold cross validation is not decreasing your accuracy, it is rather giving you a better approximation for that accuracy, including less overfitting. In other words, the accuracy of your models is (approximately) 66%.

If you want to improve the accuracy, think of improving accuracy computing with k-fold cross validation. There are several ways to try to improve, for example:

a) get more data/better data

b) try other classifiers - svm, random forest, etc

c) try which combinations of features seems to work best

d) generate better features

Mohammad Fakhredanesh

How you reach to 88%? is it important to answer your question.

On the other hand, K-fold is a relatively accurate measure and it dose not reduce efficiency, it just compute accurate efficiency!!

Rojalina Priyadarshini

You can also use split method for checking your accuracy.

The whole dataset can be divided into two parts. One is training set (which may include 80% of total data), and the rest of the data set is used as testing set. After training your model with training dataset, find the accuracy on the test dataset. Then see how much accuracy you are getting.

Talha Anwar

hi, @ Miguel Patrício i have first split the data set into train and test set, 80% and 20% respectively and got 88% accuracy. but when i apply k fold, my accuracy is reduced.

Rojalina Priyadarshini i have received 88% by same process you described

More you can do is that, the split points you can choose random. Lets suppose your data contain 100 instances, at first iteration, you select 1 to 80 instances as training and rest for testing. At the second iteration, select 20th to 100th instances, as training and rest as testing and so on. While calculating the accuracy percentage, take the average of all the test sets. I hope you may get a correct accuracy percentage.

Thanks

" i have first split the data set into train and test set, 80% and 20% respectively and got 88% accuracy. but when i apply k fold, my accuracy is reduced. "

When you use k fold with k=5, your scenario (split to 80%train and 20%test) repeated 5 times with 5 different test data (each time a new 20% of all data). therefore this result is more accurate and different than just one time split them and may be reduced or increase.

Zahid Mehmood

It is natural. Cross-validation almost always lead to lower estimated errors - it uses some data that are different from test set so it will cause overfitting for sure. But the percentage of decrease is quite big, and if you have big sample size and it can not be explained with stochastic effects, I would suggest that your classification method is overcomplicated.

There are a few reasons this could happen:

Your "manual" split is not random, and you happen to select more outliers that are hard to predict. How are you doing this split?

What is the K in k-fold CV? I'm not sure what you mean by Validation Set Size, you have a fold size in k-fold CV. There is no validation set, you run the cross validation using your entire data. Are you sure you're running k-fold cross validation correctly?

Usually, one picks k = 10 for k-fold cross validation. If you run it correctly using your entire data, you should rely on its results instead of other results.

Adeyemi Adegbenjo

when you are saying you are getting 88%, is it 88% for your training data or your testing data? Accuracy of your testing data or cross validated data is the reliable one not the training data.

Seth Adjei

I agree with Zahid and others who have explained the k-fold cross validation method. Note that k-fold cross-validation reduces overfitting, it does not completely eliminate overfitting. So i will trust the results of your cross-validation over your manual split of the data.

Andreas Rau

This is actually not what you are supposed to do. You are repeating the splitting, training and testing part k times in the cross-validation process and average the final result. There also is no "manual" part.

If your data-set contains 100 data points, you randomly select 80 for training and 20 for testing (and repeat the process k-times)

Tom Arjannikov

Your first (random) split into 80/20 produced very optimistic but incorrect accuracy results. The reduction in accuracy that you observe is exactly why k-fold crossvalidation is used.

Usually, if your sample is sufficiently large to represent the true underlying class distribution, then you don't need to cross validate. However, for smaller datasets it's helpful for gauging how well your algorithm would perform in actual would application.

There's nothing wrong with your accuracy, it did not drop. You just had a bad initial estimate.

Md. Mohaimenul Islam

If your data is small amount, you can try 10-fold cross validation. If your data size is large then you can use 5-fold cross validation or 70-30, 80-20. However, your question is not clear, how many fold you are using and what is your data size.

Sergey Porotsky

It is some strange result. Theoretically LOO (Leave -one-out) supports best accuracy, k-fold - less better, and holdout (your approach 80:20) - worst. But randomly (for your task, for your partition, etc.) it may be opposite. I also sometimes have got results, that 3-fold Cross-Validation is better for comparison with 5-fold.

Subburaj Ramasamy

Understand the purpose of cross validation. If the accuracy of CV is low, then either you need more data for training or you have to improve the model or the model chosen is not good enough. One should not firm up the model only based on training accuracy

you may identify which data and whether it is testing or training accuracy so that we can comment