Should I consider test data set to validate the stability of a model when I use K-Fold cross validation technique?

I would generally say that you should try and use a separate test data set. External validation using a separate test set is considered to provide greater confidence in model generalisability, particularly if these data are not directly related to the development data, e.g. data was recorded by other observers.

The reason for this is that some heterogeneity is likely to exist between data sets because of different populations, different observers and/or different measurement equipment. Models that generalise well are expected to be robust against such heterogeneities. The best way to assess this is through external validation.

Another reason why a separate test data set may be useful is that results are more easily reported.

However, as you stated, you do lose out on potential information. This is an issue if your development data set does not by itself capture the effect of probable sources of variability, e.g. different observers. A cross-validation approach may help you include such information if you have access to two or more different data sets. If you only have one data set with a limited number of samples (which happens quite often in medicine and life sciences), cross-validation may or may not actually assess generalisability. You can only establish this through external validation. Cross-validation may however indicate that there are issues in your model development approach, e.g. a strong bias that may by shown by comparing model performances between training and validation folds.

For more information see the TRIPOD guidelines (http://www.equator-network.org/reporting-guidelines/tripod-statement/). Particularly, the Explanation and Elaboration document is quite informative. I also authored a paper on this subject if you are interested: Article Why validation of prognostic models matters?

Rangeet Pan

Completely agree with Alex. The answer is yes and no. Yes, you can validate the model using the test data if the test data represents all types of features you expect and the data should be properly balanced. On the other hand, if you are skeptical about the data quality, you can use different test dataset if it matches with your requirement.

C K Gomathy

Hi sir,

Yes i agree.

Cross-validation, it's a model validation techniques for assessing how the results of a statistical analysis (model) will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. hat is, to use a limited sample in order to estimate how the model is ... As this difference decreases, the bias of the technique becomes smaller .In order to evaluate it, I calculate the stability and the backtesting, using part of my data not used .

Kindly Refer this link:

https://towardsdatascience.com/cross-validation-in-machine-learning-72924a69872f

Best wishes

Amine Amyar

Cross-validation (CV) gives a more accurate measure of model quality, which is especially important if we are making a lot of modeling decisions. On small datasets, the extra computational burden of running cross-validation isn't a big deal. These are also the problems where model quality scores would be least reliable with train-test split.

For the same reasons, a simple train-test split is sufficient for larger datasets. It will run faster, and with enough data there's little need to re-use some of it for holdout. Alternatively a portion of the training set can be reserved for this purpose and not used in the rest of the learning process. But if the amount of labeled data is limited, this can significantly degrade the performance of the learned model and cross-validation may be the best option. Thus, cross-validation is widely accepted in data mining and machine learning community, and serves as a standard procedure for the sake of model selection or modeling procedure selection [1].

In addition, the large-sample conditions indicates that it is inappropriate to decide whether a sample is large or not only by the number of instances. For instance, when the classification accuracy of an algorithm on a data set is close to 50%, the testing results of a fold that contains only 20 instances is likely to satisfy the large-sample conditions. On the contrary, when an algorithm has close to 100% prediction accuracy on a data set, a fold containing more than 200 instances may fail to be a large sample [2].

There are two possible goals in cross-validation[1]:

To estimate performance of the learned model from available data using one algorithm. In other words, to gauge the generalizability of an algorithm.
To compare the performance of two or more different algorithms and find out the best algorithm for the available data, or alternatively to compare the performance of two or more variants of a parameterized model.

References

[1] Refaeilzadeh, Payam, Lei Tang, and Huan Liu. "Cross-validation." Encyclopedia of database systems. Springer, Boston, MA, 2009. 532-538.

[2] Wong, Tzu-Tsung. "Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation." Pattern Recognition 48.9 (2015): 2839-2846.

What is the best method to determine crystallite size, microstrain and dislocation density from XRD data?

How can I determine the magnitude of burgers vector in Al alloy?

Why do I have a big gap between prediction error in train and test data set in ANFIS results?

How can I specify qualitative and discrete values in neural networks?

Is it possible to obtain an equation between neural network input and output value? How can I do it?

Feedback defines the constitution of an organism?

How can I prepare virus for a TEM or SEM imaging?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

What are examples of AI for good projects a teacher can assign to students?

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?