Is there minimum sample size?

The bad answer is two. With only one replicate the denominator for calculating the standard deviation goes to zero, and division by zero is problematic.

A slightly better answer is to suggest products like G*Power for estimating a good sample size. See: http://gpower.hhu.de/

A better answer: Start by loading R (https://cran.r-project.org/). I would also suggest RStudio (https://www.rstudio.com/), but that is your choice.

Start with this program:

replicates=10

data1=rnorm(replicates, mean=0, sd=10)

hist(data1, nclass=10)

In this program I have asked for 10 replicates, but please try this with 3, 4, and other values. The histogram has ten classes. You should also change this to other values. Finally, you can change the mean and standard deviation to better fit your existing data. If no data exist, then simply guess.

Question 1: If you run this program twice, do you get the same result?

Question 2: If you do not get the same result, what must you change to get the same result?

Question 2a: Does your answer change depending on if "same" is quantitative or qualitative? You can add a line to the program to get R to calculate the mean.

replicates=10

data1=rnorm(replicates, mean=0, sd=10)

hist(data1, nclass=10)

mean(data1)

Question 3: Does your answer change depending on the value of the mean? If you use rnorm(replicates, mean=0, sd=10) versus rnorm(replicates, mean=10, sd=10) or something similar, does the answers to questions 1 and 2 change?

You asked about two independent groups. This is easy, and the program would look like this:

replicates=10

data1=rnorm(replicates, mean=0, sd=10)

data2=rnorm(replicates, mean=0, sd=10)

t.test(data1, data2)

The last line gives a t-test. The population is infinite, the underlying distribution is Gaussian, so this should work. As usual, you should run this several times and use several different values for the number of replicates, the mean, and standard deviation. You can make this slightly easier by rewriting the program a bit.

replicates=10

mean1=0

stdev1=10

mean2=mean1

stdev2=stdev1

data1=rnorm(replicates, mean=mean1, sd=stdev1)

data2=rnorm(replicates, mean=mean2, sd=stdev2)

t.test(data1, data2)

The next problem is that you can only look at a few runs of the program. What would give you awesome cosmic power would be to see the outcome of thousands of runs of the program, or even thousands of thousands. This can be done in an itty-bitty living space as follows:

replicates=10

mean1=0

stdev1=10

mean2=mean1

stdev2=stdev1

size=50

datamatrix1=matrix(1:size, ncol=1)

for (i in 1:size)

{

data1 = rnorm(replicates, mean=mean1, sd=stdev1)

data2 = rnorm(replicates, mean=mean2, sd=stdev2)

temp1 = t.test(data1, data2)

datamatrix[i,1] = temp1$p.value

}

hist(datamatrix1[,1], nclass=200)

Please start with size=50. The histogram should look like a set of irregularly spaced bars with x values from 0 to 1. If that is roughly what you got try increasing this to 500. There should now be few gaps. Then try increasing mean2 from zero to seven (7). The histogram should now have large values on the left side of the graph that rapidly decline when reading the graph left-to-right.

If this seems to be working, try increasing replicates to 10000. The irregularities in the graph should smooth out. However, the program now takes a few moments to run.

Given two samples, the p-values from any test are a distribution. If the null-hypothesis is true, the distribution of p-values is Uniform. As the difference in means increases, the distribution of p-values shifts such that a significant value is encountered more often.

R will provide you with many other tests, both parametric and nonparametric. Play with the program.

Determining sample size is more art than science. If you used published values to calculate sample size then you must assume that the population in the original study is the same as your population. If you guess at values because there is insufficient published information, then the outcome is only as good as your guess. Finally, sample size calculators when used appropriately provide a lower bound to the appropriate sample size.

I think of sample size as risk management. Increasing sample size decreases the risk that I arrive at an incorrect conclusion. Increasing sample size decreases the risk that some other scientist that runs my experiment will arrive at a different conclusion.

Develop a code that changes itself?

How does it sound offering a new course Industry 4.0 for business administration students?

What are the basic difference between PLS-SEM and conventional SEM Method?

Is there any limit for number of groups in ANOVA?

ANOVA or Correlation Analysis?

What is true zero?

Oblique rotation in factor analysis?

Can you recommend simulations/games for statistics?

How can it be possible to publish more than one thousand scientific papers?

Is there maximum sample size?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Ethylene glycol is newtonian or non newtonian fluid?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?