What are some of the disadvantages of data bootstrapping?

More Oluwaseun Fadeyi's questions See All

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

How to generate a citation of my paper from ResearchGate?

How we can cite the papers from ResearchGate. I am trying to create citations for this article, Quantum Machine Learning Algorithms for Optimization Problems: Theory, Implementation, and...

08 August 2024 6,690 3 View

Does Anyone have expertise in in vitro transcription and RNA pull down assay?

I am currently working on LncRNA; to know the lncRNA-protein interactions I want to do RNA pull down assay, so I need to design primers with T7 promoter. I need assistance in this regard.

07 August 2024 6,622 1 View

How to fix background error in rietveld refinement of one XRD peak using GSAS-II?

I want to refine one XRD peak of my in-situ xrd but the background is never working good which ultimately fails the refinement. How to refine and adjust the background using GSAS-II

05 August 2024 5,291 2 View

How can I add own Henry coefficients in Aspen Plus?

Hi, i would like to simulate an absorption process in Aspen Plus. I want to use the NRTL model und would like to add some individual Henry coefficients. Is that possible and how?

05 August 2024 2,333 2 View

Why might the impedance values for DI water and 0.1X PBS buffer solution exhibit a decreasing and increasing trend, respectively over time (HP 4194A)?

Hello everyone, I'm encountering an issue with my electrochemical impedance spectroscopy (EIS) measurements and would appreciate some insights. Experimental Setup: Electrodes: Gold interdigitated...

05 August 2024 3,783 2 View

Can usage of AI tools like chat GPT in research work is recommendable ?

AI tools like ChatGPT can enhance research work significantly when used responsibly and in conjunction with thorough human oversight.

05 August 2024 1,842 3 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

Usage of internal standards in LC-MS/MS analysis?

Have you ever seen a LC-MS/MS method uses both internal standards and external standards (in matrix matching purpose) but the concentrations of internal standards are outside the calibration curve...

05 August 2024 3,084 6 View

ANY free software for reconstructing neurons in the microscopic image?

Hi everyone, I am working on brain slices for visualizing a protein in the soma and dendrites, using a fluorescence tag. However, I need a tool (not paid) for reconstruction of the whole neuron,...

04 August 2024 4,725 2 View

Bootstrapping in SEM Amos ?

Can a researcher use small sample size of 40 while using SEM with bootstrapping of 1000 and is it possible to get published

28 July 2024 4,402 2 View

How should I analyze this mediating model?

I was instructed to analyze with amos, but the hypothesis was H1. Opportunistic behavior, relationship benefits affect immersion through trust, H2. Relationship in which trust influences...

09 June 2024 3,788 1 View

Error in divBasic from diveRsity when bootstrapping for Fis: Error in x[, 1] : incorrect number of dimensions?

Hi, I want to include the global Fis index in my divBasic calculation from the package diveRsity. To do so I set the bootstrap = 100. However, I have this error in return: Error in x[, 1] :...

05 June 2024 6,670 0 View

If dataset don't meet the required assumptions of Inctraclass correlation coefficient, would calculating CI with bootstrapping be useful?

I want to study agreement between 2 quantitative measurements, but I noticed they do not meet the required assumptions of normality and equal variance for ICC. I read about some non-parametric...

28 May 2024 5,650 0 View

How to perform a correlation with uneven dataset?

I am trying to perform a correlation in order to understand how the degradation influences the seed vitality. The problem, however, is that the two data are not balanced. That is, the degradation...

05 March 2024 2,335 0 View

How do I visualize a bootstrapped paired Students t-test?

Any suggestions for software I can use?

21 February 2024 6,841 3 View

Can moderation analysis be run using PROCESS Macro in SPSS if the normality assumption is violated?

My current study has almost 1000 responses. However, for one of the item I am interested in examining, it is not normally distributed (see attached image 1 for your reference). Since participants’...

10 February 2024 7,125 5 View

How to increase bootstrapping values of a phylogenetic tree?

I want to establish a phylogenetic relationship of our fungal species of interest along with the other fungal species belonging to the same order (Hypocreales). In order to perform this, we have...

22 January 2024 9,174 5 View

Mediation analysis - bootstrapping CI at 0?

When bootstrapping CIs go through 0 there is no significant indirect effect. But what if one of the CIs is at 0? e.g. -0,007 - 0,000 or 0,000 - 0,065 Is this also considered not significant?

14 January 2024 5,081 4 View

How to compute bootstrap standard errors with plm in Rstudio?

I have a fixed effects model with only few observations and would therefore like to bootstrap in order to obtain more accurate standard errors. At the same time, I assume SE to be clustered thus I...

21 December 2023 6,797 0 View

Timothy A Ebert

Bootstrapping is where you sample a value from a population of data and then replace that value before drawing another value. If your data has rare extreme values, bootstrapping will undervalue these observations. If these extreme values are partly a result of mistakes, then this might be good. If these are a true part of the underlying distribution then this will be bad because it will underestimate the variability in the underlying population.

Randomization is where you withdraw values from a population of data without replacement. If one uses all of the data, then the rare original observations will be as common in each randomization as they were in the original data.

Ouzzani Fares

Hi,

I partially agree witg Timothy A Ebert. The given definition of Bootstrapping isn't complete. In fact we distnguish two types of Bootstrap:

1 - The non parametric Bootstrap: when the data distribution is not known, so you have to perform a sampling with replacement as Timothy A Ebert said, and you'll have values that figures in original data.

2 - The Parametric Bootstrap: when you estimate the model or the data distribution, so you just simulate samples from the estimated model, so you can obtain values that doesn't figure in the original data.

In the second case, the extreme values behaves as well as in the original data, if the density (or the model) is correctly estimated.

I agree with Ouzzani, except that I would not use "bootstrap" for #2 (though a quick internet search turns up "parametric bootstrap"). I use the definitions set forth by Bryan Manly in his book "Randomization and Monte Carlo methods in Biology".

Using the definition of parametric bootstrap set forth in http://www.stat.umn.edu/geyer/5601/examp/parm.html. I do not see how this would differ from simulation (Monte Carlo simulation). I used the observed data to identify an underlying distribution. I then take that estimate as "true" and set up a random number generator to generate "similar" data sets. The concept of "replacement" does not apply. In both cases (bootstrap and simulation) the idea is that you are not changing the distribution of the resampled data. However, with bootstrap there is a problem with a rare observation that it is a unique value. There are no other observations that are near. I might have values like 2,4,3,6,8,4,27. In such a data set there is no possibility of getting a value of 26. In contrast, with simulation all values are possible. I could get 21.743 or I could get 57.3.

My apologies if my interpretation of the parametric bootstrap is in error. However, as described I do not see the "parametric bootstrap" method as being a bootstrap. The method is perfectly reasonable, the name is not.

In my knowledge, a rare observation is an observation with a tiny occurence probability , and the estimated density (which you use it to generate new samples) should take into account this smallness.

What I understood from your definition of rare observation is that it is impossible to reproduce a similar one. For example, even if you meet a person who weights 350kg, there is no chance to meet another one who weights 340kg, since both are very rare cases.

Also, If you generated your data in an independent way, then you have allowed the occurence of data which hasn't been observed.

Example: When I measured the weight of 10 persons I found the following results:

79 , 58 , 74 , 71 , 63 , 67 , 71 , 98 , 82 and 62

If I apply a non parametric bootstrap, it will be as saying "there is no chance to mme someone who weights 60, 70, 80 or 90".

This is why, in my opinion, the parametric bootstrap is more welcome.

For the name, you can refer to Efron in his paper:

EFRON, Bradley. Bayesian inference and the parametric bootstrap. The annals of applied statistics, 2012, vol. 6, no 4, p. 1971.

As I said, the problem is not with the method, it is with calling it a "parametric bootstrap." To me it does not seem to have any relation to non-parametric bootstrap except that it is a computer intensive approach to data analysis.

Rare observations: A rare observation is one that is many standard deviations distant from the mean. I have a rarer event if it is more standard deviations away from the mean than the other rare event.

In a data set, there will be rare events. Most of the time we are not able to go back and get more data. If the data are plotted with value on the x-axis and frequency on the y-axis, then there are gaps between observations as you move along the x-axis. The size of those gaps increases with increasing distance from the mean. If you estimate the probability of observing a specific value, that reliability of that estimate declines with increasing distance from the mean. What the simulation approach assumes is that you gathered enough data to accurately estimate the probability of the rare events.

The bootstrap method does not assume that a value of 96 is impossible, but it also does not factor in the possibility that such an event exists. It is more like it treats the value of 96 as missing data because there were no observations of 96 in your sample of ten values.

My objection is only with the name. I dislike the name "parametric bootstrap" because it includes the word "bootstrap" but has nothing at all to do with bootstrap in the sense of "non-parametric" bootstrap. The only thing in common is the use of a computer. The approach to data analysis is great, the name is the problem.

So if I have a parametric and a non-parametric bootstrap method, then it would seem like I could have a parametric and a non-parametric randomization test using the same logic that gave rise to the parametric bootstrap method. What is the difference between a parametric bootstrap and a parametric randomization test?

It is true that there is an ambiguity in the name since there is no difference between the parametric bootstrap and the parametric randomization, so I completely agree with you in all points

Isa Olalekan Elegbede

https://www.researchgate.net/deref/https%3A%2F%2Fepublications.bond.edu.au%2Fcgi%2Fviewcontent.cgi%3Freferer%3Dhttps%3A%2F%2Fwww.google.fr%2F%26httpsredir%3D1%26article%3D1080%26context%3Dejsie