How does data set size influence the effectiveness of nonparametric bootstrapping for estimating model sensitivity to parameter values?

More Igor Shuryak's questions See All

How do I eliminate noise variables when using ensemble prediction methods like randomGLM in R?

The task involves predicting a binary outcome in a small data set (sample sizes of 20-70) using many (>100) variables as potential predictors. The main problem is that the number of predictors...

10 November 2015 9,260 11 View

Is it justified to fit a mathematical model to different types of data simultaneously?

For example, cells were exposed to different concentrations of a toxic chemical. Two types of data were measured: (1) the critical concentration above which no cell growth occurred, (2) for lower...

03 April 2015 5,159 24 View

How to appropriately account for errors when fitting a mathematical model to summary data which represent percentiles (e.g. median) of a histogram?

The data provided are percentiles (median, 75th and 90th) from several histograms. The number of observations comprising each histogram is known. The shape of the data distribution in each...

31 December 2014 7,014 5 View

How to estimate a sample size for a mouse study intended to compare several potential dose response shapes using Akaike information criterion?

Suppose a mouse study is being designed which aims to address the question of which dose response shape (e.g. linear, quadratic, exponential, etc) best describes the response to a particular...

04 May 2014 3,157 11 View

Is there human or animal in vivo data on reprogramming of non-tumorigenic cancer cells to cancer stem cells during cancer radiotherapy?

Some in vitro data using cancer cell lines has been published, which suggests reprogramming of non-tumorigenic cancer cells to cancer stem cells induced by ionizing radiation. Is there some...

03 April 2014 6,428 1 View

What are some software tools (preferably free) to fit mathematical models containing differential equations to experimental data?

Suppose a mathematical model consisting of a system of differential equations is constructed to describe some biological phenomena. Analytic solutions are not available - only numerical ones. Are...

03 April 2014 551 14 View

How to statistically analyze experimental data which turned out to violate the assumptions used for initial power/sample size estimation?

For example, suppose a certain method was chosen for power/sample size calculation for a future experiment, with the intent to use the same method to analyze the experimental data. But the...

31 December 2013 5,465 20 View

What are the most efficient methods for power/sample size estimation for mouse studies with longitudinal data (e.g. tumor volumes measured over time)?

Among the many available methods, which are the most efficient (i.e. require the smallest sample size for defined alpha and beta) and relatively straightforward to implement?

11 December 2013 9,977 10 View

What are some useful techniques to statistically compare exponentially distributed data?

Suppose there are several treatment groups. Within each group there are exponentially distributed data, i.e. the number of counts is proportional to exp[-A*x^B], where A and B are parameters. What...

09 October 2013 476 56 View

How to assess goodness of fit for a non-linear model?

Suppose a non-linear smooth function is fitted to some data (e.g. means and standard errors for cell survival after various radiation doses). What are some useful ways to assess goodness of fit...

08 September 2013 8,064 39 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?

Hello! I have this scale which had 10 items initially. I had to remove items 8 and 10 because they correlated negatively with the scale, and then I removed item 9 because Cronbach's alpha and...

01 August 2024 4,606 7 View

Why 3 replicates for most biological assays? Is it enough to examine the data fits normal distribution?

Just bounced on me. Before statistically analysing significant difference, shouldn't we see if data fits normal distribution first? Is 3 replicates enough to testify the hypothesis of normal...

31 July 2024 8,141 13 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

How to back transform the results generated from analyses using log transformed with In(X+1) data?

I am conducting my analysis using SPSS. I log transformed my data using In(X+1) as my data contain zero values. However, when I want to back transform the regression coefficients generated from my...

31 July 2024 7,860 3 View

Request for Advice: Starch Metabolism Research Project?

I am currently considering a research project focusing on a comparative analysis of starch metabolism in orchids and roses. I am particularly interested in identifying the types and quantities of...

30 July 2024 4,267 2 View

Can the limit of quantification (LOQ) of an analytical method fall outside its linear dynamic range, or must it always be within it?

Can an analytical method's limit of quantification (LOQ) be outside its linear dynamic range, or is it always required to be within it? Please provide a thorough explanation supported by verified...

29 July 2024 7,198 9 View

Jeff Skinner Popular answer

First, you need to make a distinction between "nonparametric", "distribution-free" and "exact" methods. A nonparametric method estimates and tests hypotheses about medians, percentiles or other quantities that are not parameters of distribution. A "distribution-free" test does not assume that your data or any statistics computed from your data come from any known distribution. An exact test is a test whose inferences (i.e. p-values or confidence intervals) are equally accurate at all possible sample sizes. A bootstrapping test is always an exact test and it is always distribution free, but it may or may not be nonparametric. For example, you could construct a boostrapping test to compare the means of two samples, which would still be parametric in that the mean is a parameter of the normal distribution and other distributions (T distribution, Poisson, etc.).

So, increasing the sample size will not affect the legitmacy of your p-values or your confidence intervals, because you are using an exact test. However, a larger sample size will give you more information about the true value(s) of the unknown population parameters or percentiles ... presuming that your samples are all unbiased.

Luis F. Rivera-Galicia

The bigger the sample size, the more accurate the results

Rasiah Thayakaran

If you are playing with nonparametric statistics, you do not need to think about much about your sample size. Anyhow, you can get reasonably good results if you use the sample size over 30. I think that it helps you.

Igor Shuryak

Thank you, Luis and Rasiah! I expected also that increasing the sample size is generally better. I wonder below what sample size would the results of nonparametric bootstrapping tend to be come too unreliable to be useful?

Jeff Skinner

Thank you, Jeff, this is a very useful description! To clarify, the procedure I am using is bootstrapping of a data set with N observations by randomly selecting N of them (thus some can be repeated, and some not used) multiple times. Each time a model is fitted to the data set, and parameter values for the model are compared. I am wondering does sample size N affect the usefulness of this procedure?

As I said before, because bootstrapping is an "exact" statistical test, the p-values or confidence intervals from your procedure will be equally valid for all sample sizes, N. Whether or not the test is useful is a completely different issue.

Are you using the correct bootstrapping test? Can your bootstrapping model answer the questions that you want to address with your experiment? Is your sample representative and unbiased? These are issues you need to address to make sure your test is useful.

Assuming that you have set up the correct test and your sample is both representative and unbiased, then you are basically asking about the relationship between power and sample size. Larger sample sizes will provide more information about your population and more statistical power to find significant p-values. When your sample size is large, then even small effects can be statistically significant. If your sample size is very small, then only the largest effects will be statistically significant. If you need to detect small, subtle differences between groups, then an experiment with a small sample size might not be useful.

Thank you, Jeff, for a detailed and useful reply!