Does cross-validation allow for unbiased/fair comparison of models with different number of parameters? If so, how? Any references?

More Hamid Karimi-Rouzbahani's questions See All

Quantile cointegration estimate?

Dear all, does anyone has the estimation code for Xiao (2009) quantile cointegration test.

28 July 2024 6,133 3 View

Getting standard errors and T-statistics for Dynamic panel data with instrumental variable method?

Dear all, I have a user written estimation code based on the paper dynamic panel data estimations with fixed effect by Galvao(2009). I have seen many papers using this method and pvalue, standard...

22 July 2024 4,588 1 View

How many samples size should I select to compare both groups?

I want to study the differences between two groups: the treatment group and the comparison group. The total population consists of 60000 women in the treatment group, distributed across different...

21 July 2024 669 3 View

How validate the dynamic molecualr simulation with experimental data?

I used forcite module of Material Studio software to simulate the dissolution of zinc production waste in deep eutectic solvent. UFF force field has been used for the optimization of molecules,...

02 July 2024 7,494 1 View

Is it appropriate to use ARDL(autoregressive distributed lag model) with environmental Kuznets curve?

The underlying idea behind the environmental Kuznets curve is that GDP per capita initially increases environmental degradation but later reduces the environmental degradation as the result of the...

28 June 2024 949 0 View

Why is stiffness of structure reduced in pushover after gravity case?

So I have a structure with a pile foundation and a gravity load acting on the structure. The demand curve is drawn by running an elastic analysis. Pile-Soil is modelled as p-y, t-z, and q-z...

14 June 2024 5,220 0 View

I want to ask if I suppose to add 200mg of antibiotic in 100ml of water so what is the concentration of this stock solution in microgram?

Hello, I recently started working with antibiotics and I do not have that much experience of making dilutions accurately. I want to ask if I suppose to add 200mg of antibiotic in 100ml of water...

08 June 2024 2,285 6 View

I need help?

Is there anyone to tell me the scientific name of this mushroom?

03 June 2024 464 5 View

Are you interested in submitting papers to a new journal called "InfoScience Trends"?

InfoScience Trends offers comprehensive scientific content that delves into various facets of information science research. This includes but is not limited to topics such as information...

26 April 2024 9,521 1 View

How can I improve tissue quality and prevent neuronal depolarization when using the patch clamp technique?

Hi I use patch clamp technique to do my PhD project. To prepare slides of brain samples, I treat them for one hour at 32 degrees Celsius in the cutting solution, and then I treat them for half an...

25 April 2024 3,801 3 View

How do we pick data for determination of Validation Acceptance Criteria?

Hello, colleagues! There is commenting open for new upcoming edition of USP 1033. Validation target acceptance criteria is now different from what it used to be and it doesn't include Cpm....

23 July 2024 7,292 3 View

Which research tool for expert validation for our study?

I am a 3rd year Computer Science student currently writing our Bachelor's thesis about finding diverse k-shortest paths in pedestrian networks. We have chosen 3 local areas as our proposed...

15 July 2024 4,289 0 View

After training an XGBoost model using K-fold cross-validation, Can I use SHAP to interpret this dataset?

In other words, I did not use the trained XGBoost model to make predictions on the test set and then use SHAP for interpretation. The reasons are as follows: Even with the best and most...

26 June 2024 7,332 1 View

What is the proper quality assessment tool of a comparative descriptive survey studies? Could I use the STROBE Checklist of cross-sectional studies?

Tile of study is "Comparing Health Information-Seeking Patterns in Exceptional versus Normal Conditions" Objective of the study is "to examine the influence of contextual differences and...

21 June 2024 7,618 5 View

Use of agency vs. google-translation for translating non-english qualitative data?

Hi, As part of a miltilevel study examining the impact of steroid toxicity in patients with different rheumatic diseases (see here: https://vasup.ndorms.ox.ac.uk/) we collected data from the UK...

17 June 2024 4,016 4 View

Do you know some research papers about the relationship between the priming effect and illustration description behavior?

I am planning to research the relationship between the priming effects and human drawing behavior in the field of cognitive psychology. I want to know about those field research links or something...

08 June 2024 6,372 3 View

Calculate sample size in cross-sectional studies?

I want to compare the RDW index between 2 groups of patients with stable COPD and COPD in acute exacerbations, so I calculated the sample size using the formula to estimate a ratio or compare 2...

02 June 2024 2,994 1 View

Is there any guideline for method validation for cosmetic?

Hi everyone Is there any guideline for method validation for cosmetic? I wanna know about criteria for that.

31 May 2024 1,990 2 View

How to calculate Leave-One-Out cross-validation coefficient q2 for a QSAR model using scikit-learn SVR?

I have a training set consisting of 39 compounds. Here is a short code to calculate LOO q2 to SVR: from sklearn.model_selection import GridSearchCV from sklearn.metrics import make_scorer,...

24 May 2024 8,967 1 View

How to determine decision limit (CCα) according to CIR 2021/808?

CIR 2021/808 written as "the CCα shall be calculated as follows: Method 1: by the calibration curve procedure according to ISO 11843-1:1997 ( 14) (here referred to as critical value of the net...

13 May 2024 9,481 0 View

Frank T. Edelmann

Dear Hamid Karimi-Rouzbahani please have a look at the following potentially useful article:

Model comparison and number of parameters

http://astronomy.nmsu.edu/holtz/a630/ay630notes/node8.html

Hamid Karimi-Rouzbahani

Thanks for the reference Frank T. Edelmann . So, based on the figure mentioned on the page (Fig 8.14 in Ivesic et al), it seems that, when cross-validating the data, higher number of model parameters does not necessarily mean better prediction/less error. Right? But, is not there any relationship between the number of parameters and the error when doing cross-validation (it may depend on the data and the model I guess)? any references? (I also could not

find the original reference "Ivesic et al"), do you know where it is?

Dear Hamid Karimi-Rouzbahani according to the link cited below, the original reference "Ivesic et al." is

Ivesic et al : Statistics, Data Mining, and Machine Learning in Astronomy (2014), with Python

http://astronomy.nmsu.edu/holtz/a630/ay630notes/node1.html

Abed Khorasani

Dear Hamid Karimi-Rouzbahani .

There is always a tradeoff between bias and variance of the estimation during model fitting. That means with higher model parameters we may reach to a low bias estimate but this may increase the variance of the estimation testing on the new dataset. Cross- validation methods such as K-fold or leave-one-out techniques are great choices for finding optimum bias/variance point in model fitting (Less model complexity with less estimate error) . You can find a great tutorial about applying cross-validation techniques in the below link:

https://colab.research.google.com/github/NeuromatchAcademy/course-content/blob/NMA2020/tutorials/W1D3_ModelFitting/student/W1D3_Tutorial6.ipynb#scrollTo=IDRwSpzB6qpS

Andrey Davydenko

Yes, it does. You need to use out-of-sample MAE or MSE depending on how you optimized your predictions. Here's a framework for a forecast evaluation setup:

Chapter Forecast Evaluation Techniques for I4.0 Systems

The framework can be adjusted depending on your setup.

Arbind K. Choudhary

Dear Hamid Karimi-Rouzbahani I fully agree with Andrey Davydenko

Hello all,

Thanks for your great explanations and references. I checked out the references. However, I am not still sure if I can make my visioned conclusion yet. Let me be more specific.

Assume that there are two models, one with 3 parameters (degrees of freedom; DOFs) and the other with 2. If we show that the first model (the one with 3 parameters) fits the data better and predicts the out-of-sample data more accurately, say using the inner loop in the nested cross-validation procedure mentioned by Ramtin Zargari Marandi , is not there any way that the higher number of parameters of the first model has contributed to its higher accuracy? I guess, my question is "how (if at all) does cross-validation necessarily avoid/block the role of DOFs of the models?". The worst case scenario can be that there is no relationship between the DOFs and out-of-sample predictive power; but is that the case (does not seem to me, I might be wrong)? I do not want to write more, but can if necessary. Thanks

Ok, great! So, Abed Khorasani, Frank T. Edelmann and Ramtin Zargari Marandi based on what you guys mentioned can we conclude that there is no predictable relationship between DOFs and prediction bias and variance?

Therefore, if somebody shows better performance (which is usually reported by error/bias) for the 3-DOF model vs. the 2-DOF model and they have done cross-validation, should we accept it from them? Or should they report both bias and variance (over validation folds)? or have undertaken some other specific procedures?

Donald Myers

"model" is a generic term so you need to be more specific about the kind/type of model you want analyze. E.g. in the case of regression the only parameter is the number of independent variables. In geostatistics the "model" is a variogram or covariance function. In practice the software will only include a small number of model types (e.g. spherical, Gaussian,Exponential, ). For each of those the number of parameters is fixed

Cross validation might have different meanings depending on the application.

That is a good point Donald Myers . By model I was thinking about mathematical equations ranging from simple low-order polynomial equations to higher order fractions which may incorporate linear and non-linear terms/functions. Such rather simple and interpretable models are quite frequent in Neuroscience where the goal is to know what role each element in the model plays and what it represents. See for some examples: Article The Normalization Model of Attention

. I come across of such models, which usually differ both in the number of parameters and in the model structure. Despite these differences, researchers usually make claims about the superiority of one model over the other, saying that they have done cross-validation (CV). But does CV avoid the complexity of the model to contribute to better fitting?