yes, you determine it by doing sample size calculations based on the statistical power you want.
The power calculations you would need would be based on proving equality i.e. that the confidence intervals of each group overlap to a predefined degree. Non-inferiority and superiority only look at one side of the confidence interval at a time, equality uses both so tends to be much harder to prove and require larger sample numbers for a given effect size.
When your sample size is small, your t-test power is small too and the results of a test with small power are not reliable. In addition, we have usually normality problem in small sample sizes. In small sample sizes usually non-parametric tests are recommended.
The bad answer is two. With only one replicate the denominator for calculating the standard deviation goes to zero, and division by zero is problematic.
A slightly better answer is to suggest products like G*Power for estimating a good sample size. See: http://gpower.hhu.de/
A better answer: Start by loading R (https://cran.r-project.org/). I would also suggest RStudio (https://www.rstudio.com/), but that is your choice.
Start with this program:
replicates=10
data1=rnorm(replicates, mean=0, sd=10)
hist(data1, nclass=10)
In this program I have asked for 10 replicates, but please try this with 3, 4, and other values. The histogram has ten classes. You should also change this to other values. Finally, you can change the mean and standard deviation to better fit your existing data. If no data exist, then simply guess.
Question 1: If you run this program twice, do you get the same result?
Question 2: If you do not get the same result, what must you change to get the same result?
Question 2a: Does your answer change depending on if "same" is quantitative or qualitative? You can add a line to the program to get R to calculate the mean.
replicates=10
data1=rnorm(replicates, mean=0, sd=10)
hist(data1, nclass=10)
mean(data1)
Question 3: Does your answer change depending on the value of the mean? If you use rnorm(replicates, mean=0, sd=10) versus rnorm(replicates, mean=10, sd=10) or something similar, does the answers to questions 1 and 2 change?
You asked about two independent groups. This is easy, and the program would look like this:
replicates=10
data1=rnorm(replicates, mean=0, sd=10)
data2=rnorm(replicates, mean=0, sd=10)
t.test(data1, data2)
The last line gives a t-test. The population is infinite, the underlying distribution is Gaussian, so this should work. As usual, you should run this several times and use several different values for the number of replicates, the mean, and standard deviation. You can make this slightly easier by rewriting the program a bit.
replicates=10
mean1=0
stdev1=10
mean2=mean1
stdev2=stdev1
data1=rnorm(replicates, mean=mean1, sd=stdev1)
data2=rnorm(replicates, mean=mean2, sd=stdev2)
t.test(data1, data2)
The next problem is that you can only look at a few runs of the program. What would give you awesome cosmic power would be to see the outcome of thousands of runs of the program, or even thousands of thousands. This can be done in an itty-bitty living space as follows:
replicates=10
mean1=0
stdev1=10
mean2=mean1
stdev2=stdev1
size=50
datamatrix1=matrix(1:size, ncol=1)
for (i in 1:size)
{
data1 = rnorm(replicates, mean=mean1, sd=stdev1)
data2 = rnorm(replicates, mean=mean2, sd=stdev2)
temp1 = t.test(data1, data2)
datamatrix[i,1] = temp1$p.value
}
hist(datamatrix1[,1], nclass=200)
Please start with size=50. The histogram should look like a set of irregularly spaced bars with x values from 0 to 1. If that is roughly what you got try increasing this to 500. There should now be few gaps. Then try increasing mean2 from zero to seven (7). The histogram should now have large values on the left side of the graph that rapidly decline when reading the graph left-to-right.
If this seems to be working, try increasing replicates to 10000. The irregularities in the graph should smooth out. However, the program now takes a few moments to run.
Given two samples, the p-values from any test are a distribution. If the null-hypothesis is true, the distribution of p-values is Uniform. As the difference in means increases, the distribution of p-values shifts such that a significant value is encountered more often.
R will provide you with many other tests, both parametric and nonparametric. Play with the program.
Determining sample size is more art than science. If you used published values to calculate sample size then you must assume that the population in the original study is the same as your population. If you guess at values because there is insufficient published information, then the outcome is only as good as your guess. Finally, sample size calculators when used appropriately provide a lower bound to the appropriate sample size.
I think of sample size as risk management. Increasing sample size decreases the risk that I arrive at an incorrect conclusion. Increasing sample size decreases the risk that some other scientist that runs my experiment will arrive at a different conclusion.
Considering doing a formal power analysis, aside from G*Power, there's also an R package pwr.
The problem with power analyses is that you need to be able to estimate the expected effect size and variability of the data. Without past information to go on, there may not be much except guessing and intuition to estimate these quantities.