How can I avoid overfitting while using bootstrapping for increasing the size of data?

More Atish Vaze's questions See All

How much calendaring compaction needed for anode or cathode in Na ion batteries? How much width should be compacted to get better performance?

How much pressure?? and How much less width need?? The coating width is 0.7 to 0.9 mm

10 September 2023 2,451 3 View

Good and easy Sodiation and Pre sodiation process for Full cell battery production, can anyone suggest me??

Pre-sodiation and Sodiation

10 September 2023 9,313 0 View

How to prevent the precipitation of polymers (PLA)?

I am trying the chemical treatment of 3D printed transparent PLA parts by chemical method (solvent dipping) using different solvents. However, a white layer is formed after the chemical treatment...

29 August 2023 8,820 2 View

How to dilute chloroform?

I want to dilute the conc. of chloroform for the chemical treatment of 3D-printed PLA.

26 August 2023 9,896 2 View

How to measure the refractive index of a transparent composites for UV light?

I have fabricated a transparent composite and want to calculate its refractive index for UV light.

30 April 2023 8,567 2 View

Why PLAXIS 3D showing higher bearing capacity than theoretical method?

the bearing capacity from hardening soil method and mohr-columb method is higher than theoretical method of Meyerhoff and Hansen method

30 May 2021 6,388 6 View

What is the difference between Bubble type and Vapor type Precursor?

I am working with La(PrCp)3 Precursor in my Lab. I am not getting enough thickness after oxide layer deposition with ALD. To solve this issue I want to know about the precursor's chemical...

16 February 2020 5,913 3 View

Which gas is better for Rapid Thermal Annealing?

In rapid thermal annealing usually we use Nitrogen, but in some articles I found Oxygen or Hydrogen is also used which gives better results. So which one is better among these three and what is...

21 November 2019 10,012 2 View

What are motivations for firms to provide its own agribusiness extension?

It has been noted that agribusiness firms are evolving in providing their own extension service in Fiji. Providing own agribusiness means an increase in costs therefore what are the motives of...

08 April 2019 8,469 5 View

How one can identify whether NO3 group is binded to metal in metal complexes From TGA?

i have prepared lanthanide metal complexes and i have to go through its thermal analysis,how we calculate the theoretical mass loss and compare with experimental mass loss . i had used metal...

31 March 2019 5,063 4 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

How can I apply boundary conditions in an orthotropic steel deck numerical model using ABAQUS software?

I am trying to simulate vehicular loading on an orthotopic steel deck bridge section in ABAQUS software. The red arrow mark in the attached figure indicates the direction in which the vehicle will...

08 August 2024 719 0 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Can you suggest reliable sources defining "3D mesh" and "3D city models"?

Dear fellow researchers, I am currently working on a paper where I need to provide a reliable reference that defines and distinguishes between 3D mesh models and 3D city models. Although I am...

06 August 2024 9,986 2 View

Please explain how the plastic input value should be considered from the true stress-strain curve for the bilinear elastoplastic material model ?

I am working on Abaqus/Explicit(Quasistatic ) for the deformation of the auxetic structure model. Please explain how the plastic input value should be considered from the true stress-strain curve...

05 August 2024 454 3 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

What are the shear and normal stiffness values of an LLDPE liner in 3D numerical modeling of a stockpile?

I am seeking experimental or applicable data for the liner (LLDPE) interface in FLAC3D numerical modeling of a large stockpile. Could you please recommend suitable references? The preferred data...

05 August 2024 3,665 0 View

Shane McGee McMahon

I am concerned when you say "using bootstrapping for increasing the size of my data." I can not, at the moment, think of a proper application of a bootstrap procedure which could be accurately described as such.

Can you describe the procedure you are using in more detail?

Atish Vaze

As you know, Bootstrapping is a statistical technique used for resampling the existing data. I am using bootstrp() function from the statistical toolbox to achieve this. Currently, I am having very few samples for a particular datatype and those samples are insufficient for solving my current problem.(i.e classification). So I am trying to use bootstrapping for increasing the size of the my data by using the original data, but I have read that using bootstrapping the resampled data tends to overfit especially for classification task. So I wanted to know if there is any way by which I can either avoid overfitting or keep it to the minimum as possible. Currently i am using @mean in the bootstrp() function for resampling the data.

Ariel Linden

Hi Atish,

I share Shane's concern over your (repeated) statement about bootstrapping increasing the size of your sample. That is not what bootstrapping does. It draws repeated samples from your existing data. I suggest you read the attached article for a better understanding of the concept.

Ariel

Article Evaluating Disease Management Program Effectiveness: An Intr...

There are no methods available to magically increase your sample size; as Ariel says, that is simply not what the bootstrap methods do. I am still unsure exactly what you're trying to do. From you're statements, my best guess is that you're trying to fit a model to synthetic data generated from a parametric bootstrap. Such a procedure could certainly lead to overfitting the data. The best way to avoid over fitting in that case is to stop abusing the bootstrap that manner.

I can at this point only recommend some reading material. The books by Hastie and Gareth offer an excellent introduction to Statistical Learning/Machine Learning. They are also freely available online and contain examples of proper use of bootstrap methods. However, their main focus is not bootstrap methods per se and the sections are only a (good) introduction. The third book by Chernick provides a good and thorough treatment of the subject. If you anticipate using bootstraps methods extensively it would be worth reading. The book is however quite expensive.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2006). An Introduction to Stastical Learning. Design (Vol. 102). Springer. http://doi.org/10.1016/j.peva.2007.06.006

Hastie, T., Tibshirani, R., & Friedman, J. (2005). The elements of statistical learning: data mining, inference and prediction. The Mathematical … (Second Edi, Vol. 27). Springer. http://doi.org/10.1007/b94608

Chernick, M. R. (2011). Bootstrap methods (Vol. 619). John Wiley & Sons. http://doi.org/9780471756217

Suketu P Bhavsar

Atish, Please try to understand Shane's valuable advise about what bootstrap is and is not. NOTHING can give you more information than is in the data (except to gather more raw data). Bootstrap is useful to measure the intrinsic reliability of the statistical measures gleaned from that data.

Oleksiy (Alex) Chadyuk

Atish, I must say I completely agree with Drs McMahon, Linden and Bhavsar here. I appreciate your frustration that neither of us seems to answer your question.

I am not sure if you are keen on a lecture in epistemology, but there is no mathematical method in existence that can reliably defend itself against overfitting. Math does take abuse silently, and a mathematician has to step in eventually (despite what AI courses and science fiction books may have led you to believe).

I am not sure if this will answer your question, but to avoid overfitting, I would suggest to build your model based on the theoretical background of your phenomenon (even if you make the theory up yourself), rather then pulling it up from your data. Then you can use bootstrapping to improve the ability of your data to validate or falsify your model -- and through it, your theory.

Build enough interesting theories, so that you can select a few that agree with the data the best, and then, out of these best few, choose the simplest one. Hope this helps.

Asyraf Afthanorhan

I am agree with the previous comment. Bootstrap is just one of the approach to produce the standard error and stabilize the standard error. So, this resampling is necessarry so that the estimate of standard error can really converged.

Thanks Shane, Ariel, Suketu, Oleksiy and Wan for your prompt reply. Shane you were right, i was trying to build a model. In my case, I have an unbalanced data and from what I read, bootstrap helps us to create a synthetic data which I thought will be quite useful in balancing my data. Balanced data is very useful for training model for classifiers like SVM. Thank you all for your help

Thom Baguley

It is possible that want you need is an imputation method rather than bootstrapping. It is possible to impute balanced data sets from unbalanced ones (under certain conditions). Alternatively it may be sensible to look for an alternative to SVM that doesn't require balance (I don't know enough to advise on that).