We received a statistical reviewer comments on our manuscript and one of the comments goes as follows: '... Note that common tests of normality are not powered to detect departures from normality when n is small (eg n
I don't have an easy solution for you. I'm just pointing out that the reviewer's suggestion to rely on what is known about the variables from other sources is right in line with this 2009 BMJ Stats Note by Bland & Altman:
This reviewer's response was more open-minded than the one we got recently. Our reviewer simply said: "You must use non-parametric tests". Showing that such data is actually very well normally dirtributed and that all other groups in this field use t-tests for this since ages did not count.
There was a point where I was delighted to hear that statistical reviewers more often have a look on manuscripts. I was sure that this would improve analyses and conclusions, force authors to better think about their models and the properties of their variables and to test more sensible hypotheses. But now I see that a considerable amount of these reviewers seem to be rather uneducated and/or uninterested first-year students being in-charge for their professors to write stupid reports. This seems to be very prominent in more clinical journals. Sorry.
It is true that normality tests won't be any use in small samples as they lack power (and may not be be robust). Its also true they are awful in large samples as they detect departures from normality that aren't sufficient to impact inference. So generally visual inspection is superior (its certainly reccommended by a number of experts).
Here arguing that there are no marked departures from normality etc. based on graphical inspection is a reasonable approach. Of course there is a degree of subjectivity. However, if there are no severe violations of assumptions then the parametric tests and non-parametric tests are similar. So I'd probably argue that the stats are ok based on graphical inspection and add a footnote that switching to non-parametric tests doesn't change the pattern (after having run the non-parametric tests to check).
This is a good case of the utility of open data and code as the reviewer or reader could easily check.
This probably does not help you much, but I thought that I would have a look at the original Student (Gossett) paper of 1918 as the test was specifically designed for (very) small samples:
"if our sample be small, we have two sources of uncertainty: (1) owing to the “error of random sampling” the mean of our series
of experiments deviates more or less widely from the mean of the population,
and (2) the sample is not sufficiently large to determine what is the law of
distribution of individuals. It is usual, however, to assume a normal distribution,
because, in a very large number of cases, this gives an approximation so close
that a small sample will give no real information as to the manner in which
the population deviates from normality: since some law of distribution must
he assumed it is better to work with a curve whose area and ordinates are
tabled, and whose properties are well known. This assumption is accordingly
made in the present paper, so that its conclusions are not strictly applicable to
populations known not to be normally distributed; yet it appears probable that
the deviation from normality must be very extreme to load to serious error. " My emphasis
" Section X. Conclusions
1. A curve has been found representing the frequency distribution of stan-
dard deviations of samples drawn from a normal population.
2. A curve has been found representing the frequency distribution of the
means of the such samples, when these values are measured from the mean of
the population in terms of the standard deviation of the sample.
3. It has been shown that the curve represents the facts fairly well even
when the distribution of the population is not strictly normal." Again my emphasis.
The are several examples with a sample size below 10 in the paper.
When I used to teach this stuff (1st year geography students), I would demonstrate the Fisher Randomization and permutation test for very small numbers as the students could do this by hand and thereby see the underlying logic of the test. I would show that you could permute the data of the two variables under the null hypothesis of no difference and see how extreme a result you could get 'by chance' and then compare the observed value to this; no normality assumptions were needed in coming to some sort of judgement.
You may assume any probabilistic distribution for your data no matter what sample size is with your data. First of all ask yourself, whether your sample is randomly drawn from a well defined the sampling population. If not, whatever your model would be, the analysis outcomes would not be generalizable based on sampling distribution ground. Secondly, ask yourself, what is your scientific research question and the connection with your data. Then hopefully, you may have a better idea what distribution would be more plausible to assume from which your sample data were generated. Finally, statistical testing based on a single set of sample data is never a solution to substitute the time-consuming scientific investigation processes. Namely, it is statistical thinking makes good science. Statistical inference unfortunately can only play a limited role in scientific inference. You may see these two articles for more information:Article The Limited Role of Formal Statistical Inference in Scientif...
Article Statistical Inference Enables Bad Science; Statistical Think...