Hi. First of all, are the subjects of your study humans, or are they animals?
Although it is unclear to me exactly which cognitive tests you are using to assess those functions and the details of the study design mentioned in your question, here are the basics for statistical analysis:
1. Since your sample size is small (n = 7), performing statistical power analysis for your samples with appropriate effect sizes would be necessary. If possible, increasing the sample size would be beneficial.
2. Based on your current sample size (n = 7), you should perform normality tests for each group to verify whether the normal distribution can be assumed within each group. The most commonly used test for this case is the Shapiro-Wilkins test, but other normality tests can also be used depending on your sample type. If the normality can be assumed, you may use parametric tests. If not, you must use nonparametric tests. Remember, to use parametric tests, normality must be accepted for all the groups for comparisons.
3. To compare the values between baseline and post-intervention conditions for each test, use paired t-tests (parametric) or Wilcoxon signed-rank sum tests (nonparametric) for the intervention group.
4. To compare the intervention and control groups, use two-sample t-tests (parametric) or Mann-Whitney tests (nonparametric).
5. If the data points of your control group are held constant over time (8 weeks), steps #3 and #4 should be enough for data interpretation. However, suppose your control group data can also change over time and thus also require pre- and post-testing. In that case, you need to perform a two-way repeated measure ANOVA (parametric) or Friedman tests (nonparametric) to compare the intervention group (pre), intervention group (post), control (pre), and control (post) and perform post-hoc tests.
6. If you use multiple t-tests for any of your experiments, adjust the calculated alpha and p-values for multiple comparisons. This is commonly done by Bonferroni correction.
These are the very basics that I can think of. I hope this helps.
Sung-Jun Lee I would disagree with several of your recommendations:
ad 1. I would agree, a sample size of 7 per group is very low and one should not expect to find anything or rather everything. It is known that effect sizes are overestimated in small samples on the one hand, on the other hand you massively lack power to detect anything (with well behaved data). Therefore, such small samples are not recommended, since some single data points can completely alter your results and are not very reliable (just simulate truely normal data with known and controllable population parameters and you will see how different the results will be, just by chance). For power analyses it is recommended to test against the Smallest Effect Size Of Interest (SESOI). You will find lots of articles about it (e.g. from Daniel Lakens).
ad 2. normality tests are not recommended to assess normality (of the residuals), since they a) are also subject to power and won't find any deviations in small N (where deviations are more problematic than with large N where lots of analyses are quite robust) and b) a non significant result is NOT evidence that the H0 of no deviation is true, since it is a conditional probability (conditioned on that H0 is true already).
Further, non-parametric tests are not recommended as a substitute for paramertric tests, especially in small samples, since a) they test different hypotheses and b) its about the distributions, which are harder to assess in small samples. Non-parametric tests are not a magical trick that your analyses suddenly work on small samples.
ad 3. and 4. This is not very helpful, since you are most likely interested in the conditional/differential, i.e. the interaction effect. For example, if both groups show a significant change from t1 to t2, you do not know anything if the experimental group changed differently, as compared to the control group. In case of a random allocation it would be possible just to compare the groups at t2, otherwise a split plot 2 factorial ANOVA (Time*Group) or a multiple regression with t2 as outcome and t1 as covariable and Group as predictor, where the latter has been shown to have greater power (see Average Treatment Effect [ATE] in the literature. Solomon Kurz did a series on this topic for example https://solomonkurz.netlify.app/blog/2023-04-12-boost-your-power-with-baseline-covariates/).
ad 5. The ANOVA seems reasonbale, but Friedman does not necessarily test what you are thinking you are testing. See above.
ad 6. I would agree, but Bonferroni might be too conservative, depending on the amount of comparisons, especially with several correlated dependent variables AND a very (too) low sample size in the first place.
Instead of only relying on inferential statistics, I would plot the data (including all data points) the trends (slopes) etc to see whats going on. Inferential statistics are not necessarily the answers to your questions, but a guide to separate signal from noise.
P.S.: just for the illustration, run the R code below several times with n=7 and after that with n=100. You will see how the results will drastically change with each run if your sample size is small. Here we know, that there is no effect for group "0" in the population from t1 to t2, but a change of 0.5 (Cohens d of 0.1, since all variables have a SD of 1) for group "1". t1 and t2 are correlated by 0.7 within each group.
Rainer Duesing I appreciate your detailed explanation.
Actually, my answers above are the widely-used statistical methods used in our lab, rather than my personal opinion, where we are much more focused on basic neuroscience involving animal experiments. Especially when it comes to studying complex transgenic animals, it is quite challenging to maximize the sample size of a particular group due to the costs and time consumed in generating the progeny. Indeed, a sample size of less than 10 is still considered too small to observe an effect, even in our study fields, as you mentioned. Thus, in this case, we would have to repeat the experiments with a larger sample size. However, I suppose the situation differs in clinical studies, where a much larger sample size is crucial for reasonable conclusions.
So, based on your clarification, I assume that in such cases where the sample size is very small (n = 7), any type of statistical test would technically be inappropriate. Am I correct?
Sung-Jun Lee I would not say that ANY type is ALWAYS inappropriate. But this depends heavily on the surrounding conditions, e.g. how rigorous are you able to control covariables. There may be situations, where it is possible to control nearly all other effects, so that you can see the "true" effect. On the other hand, if this effect would be so clear, you do not need inferential statistics. My points/personal opinion:
1) it is problematic to use small samples, even if you expect large effects and power analysis suggested it. This is because small samples are very sensitive to single deviating values and I would not expect to get an absolutely well behaved sample with real world data.
2) There are no magical, statistical tricks that you will get extra information out of small samples. Non-parametric tests have their own assumptions and are not a 1:1 analogue to parametric tests. A MW-U-test does simply not test the same hypotheses as a t-test. Therefore, it is problematic to suggest it as an alternative, if "something is wrong" (e.g. distributional assumptions). If this would be the case, we would use it in the first place, wouldn't we?
3) We should not only rely on the inferential statistics and its parameters. Plot the data and understand what is going on. If you have a sample size of of n1=n2=7, but the plots of the data look reasonable and well behaved, there is nothing wrong with an inferential test, although you should not expect a high power.
4) The point of visual inspection holds especially to check assumptions. The problem with tests like KS or SW to test normality is explained above. Additionally, they wont tell you where the problem is. A histogram and QQ plots will show you if the basic distributional assumption is reasonable and maybe single outliers deviate (use robust tests for example in that case) or if the distribution is totally off and you should use a completely different model (e.g. generalized linear model like Poisson regression).
P.S.: in case of small samples, Bayesian approaches may be viable, IF you have quite strong and reasonable prior information, which you can incorporate into the model. Uninformed or vague priors wouln't help here either.
Rainer Duesing thanks for the detailed information, I already had considered before what you already explained and for this situation Sung-Jun Lee suggestion was more applicable and practical and steps 1-4 was enough, and also was the similar steps to what I already had applied.
Mylene Klaus If the purpose of your question is only for a pilot study, then it seems fine. However, using your results from this experiment directly for publication may not be acceptable due to your small sample size.