I wonder about this because for such small data sets it is not always clear whether the data are normally distributed, so perhaps results (p values) from both parametric and nonparametric tests could be more useful than either test alone?
Apply the concentration index (based on U, you can find it in the attached file). The point in any case is not with normality but with the disproportionate effect tha a single outlier can exert in samll data sets. The CI you will find explained in the attachd paper (see methods at the end) solves exactly this problem and gives you bot a probabilistic statement (based on U) and a descriptive index of the distance between the two (small) groups.
Dear Igor. Doesn't make any sense to use parametric tests with such a small sample size. You should use the Mann-Whitney to test distributions or medians. If you are using SPSS, go to the independent samples option instead of legacy dialogs, in the nonparametric menu. I attached a nice document with some examples on which you can easily follow. Hope it helps. Cheers
Yes, no need to report the t-test results due to small sample size. You will be questioned by reviewers anyway if you did so. Just deal with the non-parametric tests you think is appropriate to your sample would be good enough.
Alessandro, thank you very much! The confrontation index you describe indeed sounds very useful! If it is used to sequentially compare multiple "treated" groups to a single "control" group, what should be used to solve the problem of multiple comparisons (e.g. some form of Bonferroni correction)?
I would focus on estimating the effect size and reporting its 95% CI using a bootstrap approach. A permutation test is also fine if simple yes / no conclusion is sufficient.
Dear Igor, if you have the intention of presenting a measure of effect size, r, can be calculated by dividing Z by the square root of N (r = Z / √N). Check out this document. Cheers
I do agree with some answeres, however, as far as the sample size is small and less than 8 -10, it goes without question to the non-parametric test as Mann-Witney-U test. Of course this test is strong and used under these conditions.
In place of asking questions on how to use a tool that is normally otherwise not applicable, it would greatly help if you can quote the specific situation of where you are restricted to only use small sets of data.
Would you please elaborate on what situation requires the application of inferential statistics that you would only have access to small sets of data?
Thanks and I am looking forward to hearing from you on this.
Shree, no problem! Consider the following example: there is a pilot study of a certain treatment on animals (e.g. mice), with small numbers of animals in each group (e.g. 6): control (untreated) and a few treatment intensities and/or durations. Suppose the analysis should compare certain parameters of the animals (e.g. body weight, concentrations of certain metabolites in urine, etc) between treated and control groups and between treated groups with different intensities.
Thanks Igor and this is common with the drug industry as testing within animals is usually the first stage. So the reason for testing in small numbers is primarily due to cost and/or time. Let us disregard time as we are focused on the efficacy of the drug so to speak even at the expense of time. Now we are left with the number of animals tested. We know that there is no scarcity of animals to be tested (no offense to animals though) here as I am also sensitive to their sacrifice to the benefit of human beings.
While we look for the minimum sample size we are facing the following parameters (say for the 2-Sample t-Test in your scenario:
A - Alpha - Type I Error (Risk to Producer), B - Beta - Type II Error, (Risk to Customer), C - Difference to Detect (difference between the signals you are looking for), Std Dev (Variation in your Process). All of these indicated a sample size of "n" that would be the minimum to demonstrate the drug efficacy I suppose.
I would also appreciate the rate at which data is available and the calendar time in which you wish to execute the project or be able to get measures of the results you are looking for. If the deadline for the project is much shorter than the time it duration it takes to measure the relevant sample size, we need to do a few other things which we can discuss if necessary.
Am I right in this process? Kindly correct me before I go to the next step. Thanks for being open on this and look forward to your response.
Shree, thanks for your interest! For simplicity, assume that the study was already done by someone with a certain small number of animals per group, and the data have been collected. The question is what analysis approach could give the most useful preliminary description of differences between groups?
Two main considerations for all data -- be they small or large sample sizes!:
1. Also always look at the distribution of the parameter -- graphically -- as this will give you an idea if it follows a more normal distribution which can happen often even with small sample sizes. If it looks more normal and your hypothesis is that it is/should be normally distributed (even if it looks skewed) -- use parametric tests as this will give you more power! You can even transform the data to make it more normally distributed to use parametric tests. However, if the data is bimodal or extremely skewed use parametric tests as will give you most power for finding a difference.
2. Good theoretical practice: What tests to use are based on if you think/hypothesize the parameter you are interested in/looking for a difference in is distributed normally (use parametric test) or non-normal (use non-parametric test)
So for any data my approach -- especially because I work with small samples:
1. Clean up data
2. Graph data and see if parametric or non-parametric.
3. Do descriptive statistics (parametric and non-parametric) -- yes, I do explore the data nowadays as I have become more Bayesian in my thought process of analyzing data.
If we know a priori that the measurand is a normally distributed value (such as in the case of repeated measurements), it is possible to use a parametric test. If the distribution of the measurand is unknown but it is believed that the data sets are belong to the same population, then the parametric test is also possible (with some caution). Otherwise the Mann-Whitney test is preferable. So, everything is depend on the task under study.
Igor, I had left a note to seek some questions from you, however, your "Thank You" reply tells me you would rather close this thread. Please let me know either way. Hope to connect and speak with you. How dow e do that? Have a pleasant week now.
Why would you want to test the normality of a small sample? Why not get a larger sample? I am yet to receive a response from Igor on this based on my first response to his feedback that they are testing animals for a specific drug efficacy.
Thank you again, everyone! Based on your responses I am inclined to go with nonparametric tests (e.g. U test) for small samples, and calculate the confrontation index suggested by Alessandro as an additional measure of the extent of data overlap. Does this make sense?
Shree: thanks for your interest, and of course you are right that designing an experiment in an optimal way is important, but my question was about a specific situation: how to handle small samples.
Yes, your approach is ok. You may not even need to use the confrontation index, just the U Mann Whitney test. Truth is that you always need a p-value, since you this is just a way to show how likely your results are just by chance or for real. And, for sure, it does not make sense to use parametric and non-parametric tests. You just have to choose the most correct one and use it. In this case, with six samples is crystal clear that you have to use non parametric tests.
I am sorry to differ that the use of non-paramteric testing is not clear to me as you post it. You just threw the entire concept of minimum sample size out of the window by justifying the use of non-parametric testing with 6 samples. Igor is working with animals on some kind of drug effectiveness or efficacy study where the risks could be high is they plan to convert this into a pharmaceutical product for use in human beings. Any kind of prototype testing at this stage is going to be susceptible to sample size as higher amounts of money, efforts, and time are needed when they take it to the field. At that point if they just used a sample size of 6, I am sure someone's head is going to roll (not literally) or someone is going to be shamed for this decision.
It is Igor's call anyway on how he would like to conclude this.
The report is depend on your result from normality test. I mean the data distribution wether your data set is normal distribution (like bell shape) or not. In statistical method, the reseacher should use parametric test if the data set is normal distribution and use the non-parametric test if the data set is skewed.
But for the established research with high index publication paper in the local & international journal, they always refer the previous published research paper to determine wether to use parametric or non-parametric test.
The number of research sample is very important to showing the accurracy of statistical analysis. More data variation in the same group of data set, will less the accurracy statistical analysis. For the less number of sample (ie. < 5 samples), most researcher generally use simple presented data like percentage, khi-square, etc.
This is my opinion based on my little experience in statistical analysis. Please refer the statistician for more suggestion and information.
in small samples or smaller than 10 subjects in each group must use a non-parametric tests. The reason for using a nonparametric analysis is to not incur in called "false-positive result". As for the non-parametric tests that you should use, this will depend on sample variables (your research), whether these variables are categorical or not.
I am sorry but I don't catch your point. Of course, it would be great to have a bigger sample size, but Igor's question was about what test would be more appropriate with that sample size.
Thank you, Miguel, Ahmad, Livia, and Jochen! Generally the suggestions to use nonparametric tests with small samples make sense to me. The confrontation index also seems to be a good idea for showing the degree of separation between small data sets.
Thanks again, Jochen! I appreciate your input. If I understand correctly then, with small samples the problem with parametric tests is that it is hard to tell (impossible to reliably test) whether or not the error distribution is normal. The problem with nonparametric tests is that it is hard to tell whether or not the data distributions are the same or not. I was not sure which is "worse", so that is why I asked does it make sense to use both tests. But it seems to me now that with small samples showing a "location shift" may be more intuitively useful than comparing the means by a parametric test. Does this make sense to you?
Igor, in addition to what Jochen says, you can get samples as small in certain types of experiments and therefore is required to observe certain assumptions (normality, independence). It also depends on the type of variable.
As a recommendation, also calculates the power of the test that sample, that helps clarify the impact of the results.
if samples are independent, I 'd suggest the nonparametric solution, mainly because the samples are too small. Namely, as mentioned above, the Wilcoxon-Mann-Whitney test to compare central tendencies or sample distributions.
But usually e.g. in economics experiments we deal with about 12 to 25 subjects / sample size, so I also maintain a small doubt about the statistical power..
This question really supports the use of the book "Common errors in statistics" (I leave the search to the interested user). Actually, when the sample size is less than 6-7 then only the parametric test is reasonable because permutation tests have very low power. Use of the Mann-Whitney should also be followed by the sign "proceed with caution". The reason is that the exact WMW has low power and the asymptotic one does not hold (asymptotic for n=6 is nonsense). So, if normality does hold for the small sample then the parametric test is very appropriate. If not, use both as you already do. Conflicting results need some elaboration (check better, understand your data). To conclude, many people in this thread supported (without any doubt!) the use of non-parametric tests. Although I am a huge proponent of non-parametric tests, very small samples (n
Thank you, Jochen, Carlos, Konstantinos, Christian, and Christos, for your stimulating comments! This is an interesting discussion! It looks like there are differences of opinion: for small samples some favor nonparametric tests (due to problems with determining normality on small samples), others favor parametric tests. Perhaps using both is not unreasonable after all? Or just using a subjective normality check (e.g. QQ plot) at the beginning to decide which way to go? Would be grateful for your input!
Thank you, Jochen! I think I get your point about selection bias. But what do you mean about using different data? Do you suggest using a QQ plot on just a subset of data first, making a decision about normality, and then applying this decision to the rest of the data? Also, I wonder what would be a reasonable strategy if in the small sample there are a couple of outliers (e.g. subjectively detected). Would this justify a nonparametric testing approach or, alternatively, transforming the data (e.g. log) and trying a parametric one?
Methods for checking normality are regrettably of very little use with just 6 samples. As Jochen pointed out, it would be impossible to determine what's an outlier or whether sample follows a normal distribution or not.
Thank you again, everyone! So as I take it then, there is a lot of subjectivity on choosing parametric or nonparametric tests for small samples, and a lot depends on type of data and researcher's goals.
A very interesting discussion! First of all, in these situations, I call for the clinicians (and the bosses) to have informations about the measures that have been taken, the biological meaning, what was the aim, etc. and I show them a graphical description of the data and I explain to them what is a sample and what are uncertainty, clinical significance as opposed to statistical significance and p-value, finally expaining what is a type I error and what is a type II error; at this point I call for a larger sample size, that is the most appropriate answer to the problem. If this not possible I give measures of central tendencies and of dispersion with difference betwwen means and I assume that the data are parametric and I go on under this explicit assumption.
I think that double calculation of the p-value, by parametric and non-parametric test is not a solution, adding some confusion, but I confess that I do not resist to test the normal distribution hypothesis (with a Kolmogorov Smirnov test, recognizing that its result depends on sample size!),
When describing the data a log-transformation can help; lastly I cannot understand why Chebychev's ineuquality (quoted in the textbook of Pagano) is not used and is not calculated by statistical softwares.
I point out with my colleagues (I have worked as an internist) that few cases overlap with another type of scientific publications, namely case-reports, that carry their own weight of evidence, so that it is easier to think that evidence is on a continuum and few cases carry a limited evidence, whatever statistic, especially inference, can do!
In this estimation may be that tests only control the *rates* but I don´t know if it is possible make it in errors in the long run, given constant conditions.
I am acording with your Ho and alternative H. For precision of the effect estimated is necessesary observe the one more datas.
Thanks a lot, Carlos, Vincenzo, and Jochen, for your input!
Vincenzo: am I right to understand that you are in favor of normality testing even with small sample sizes? What would be your plan then if the test suggest a non-normal distribution? Also, why do you suggest the K-S test specifically (e.g. why not Shapiro-Wilk)?
First I try to answer to Jochen; as you put the issue of precision of effect my answer is that I cannot exclude an effect with confidence, but I have the pendulum of evidence begins to prefer a "no" with a number of limitations and doubts. I can try to see for example, how many patients I have to enroll to test the opposite hypothesis (some effect existts): if I need thousands of observations to assess the point -say with a power of 0,8- why bother? (I recognize now that this can be misleading). This is somewhat different from the point I of assumptions (your second paragraph) because a formal null hypothesis is anyway "just round the corner", and in the first of your remarks you rightly recognized that calculations are in the same logic while interpretations are different, you say (I say are the same but from different points of view). The main assumption is that, after the first 6 observations of Igor the next 30 clarify that Igor is working with a normal distributio; the biologist should aware of this assumtion and has to add observation to proceed in ther direction of more evidenced conclusions.
Second question: probably my answer is yes (remember that I was a clinician). The biological/clinical scientist is questioning me, like a patient or a colleague question me about a health problem of a single patient, and I have to give my answer, after explaining the limits of it. I can call for further ascertainments, I can "wait and see" but I'm beginning my process of "learning" as you rightly say, also in the sense that I learn something something that I'll apply for the next patient, not for this one.
Last point (secon remark): "If H0 is rejected, HA is taken to be truth, and there is no (known) uncertainty associated with it": what about the cut-off level thet splits the world in only two categories, reject/no reject?
You have a drug that did not prevent death from digestive hemorrage in a well-done clinical trial, p-value was 0.05 with a non significant advantage in the patients treated (intention-to-treat analysis, power 0.9) versus not treated patients (significance with p
Thank you Jochen, I appreciate your framework of significance level, that I have never approached so deeply: I have now to think a lot; I hope that Igor has now a clearer perspective of how go on with his/our problem.
Just a little story: I clinician tells an epidemiolgist a surprising finding and wants some mathematical confirmation; the epidemiologists answers: interesting, return when you have 99 cases more.
Thank you again, Jochen and Vincenzo! This is a very stimulating discussion. As a very simplistic "message", what I take away from this is the following: In situations where sample sizes are limited, one can look at the data say by QQ plot to assess normality subjectively. Unless this shows "obvious" inconsistency with the normal distribution, proceed with parametric testing. Does this make sense?
Thanks, Jochen! By saying to "re-think the model" do you mean trying known non-normal distributions (e.g. exponential) which would better represent the data?
This discussion confirm that is neccesary incluce in our models one more quantite of datas and organize it in a panel data for getting a significat validation. For me is convenience to use the DEA approach.
Jochen, thanks for the clarification! So if the data (e.g. seen on QQ plot) or model fit are inconsistent with normality (e.g. residuals are non-normal), you recommend altering the model function and/or assumed data distribution, before going to nonparametric testing?
I see Jochen, thanks! Actually when I am saying "nonparametric testing" I mean also methods like developing a model and fitting it to the data, but assessing sensitivity to parameter values by generating multiple synthetic datasets by nonparametric bootstrapping and fitting the model to each of them. I guess the parametric alternative to the latter would be to generate synthetic data sets using measured means and standard deviations, and assuming the normal distribution. Does this sound reasonable to you?
Thanks, Jochen! To me these types of methods (bootstrapping and synthetic data sets) are attractive because they are in some sense conceptually simple, and I implement them writing my own Fortran code typically. In terms of computational intensiveness - you are right of course, but for small data sets doing say 10000 iterations is no problem at all, it is quick.
I just registered in Research Gate and found this interesting discussion.
I work in a different field than you: marine ecology and aquaculture where experiments usually involve less than 5 replicates (sometimes 2 or 3). Two comments:
1. On t or U tests with small sample sizes. As some of you stated, both tests imply assumptions on the distributions, which can not be adequately tested with small sample sizes, because of power and other issues. So, if a formal Ho test is required either would be fine (or not). Anyway, significance tests attempt to make inferences about population parameters or differences based on samples or experiments, but valid inferences of this type usually can´t rely on a single trial (with small or large sample size or P-values), but rather on the repeatability of results in different conditions, labs, times, etc. For example, if a clinical trial with few patients suggests an important advantage of new drug over the conventional ("significant" or not), yes, repeat the trial with more replicates, but most importantly, try to repeat it with other kinds of patients (e.g. ages groups, etc.) and to convince other colleagues to do it in other labs and parts of the world to try to expand the scope of your findings and the potential benefits of this drug (certainly the publication of your results in a well respected journal would help here). This would be more helpful in reaching a general conclusion of the potential benefits of the new drug than a single "perfect" experiment, with lots of replicates and the best "state of the art" statistical tests.
2. On the "significance levels" issue. This is the part of the discussion I enjoyed more and I am glad to see this argument is spreading. Yes, I agree with you in that the "conventional" alpha value of 0.05 has been and still is damaging science "significantly". We don´t live in a black and white world. For a more thorough discussion on this issue please read the attached paper written by one of my Statistics teachers (certainly the most influential for me). I hope it goes through.
I would like to add to the Cornell Question that the significance for that sample is no clear. Compare small sample give the problem of heterocedasticy and autogregression. In fact it is the problem more finding in a some works.
Patrice: most of my questions here are general in nature, mainly how to handle data which has not yet been produced. However, since you are interested, I attach a sample data set. Here there were 2 cell types treated with 3 different intensities, with 6 samples per intensity. Significant differences can be found using both parametric and nonparametric tests, and a simple model can be constructed and fitted. I would be curious and grateful to hear what you (and anybody else interested in this) think!
Dear Igor with that sample I think that negative value can to find some troubles for testing, so it depend of the model used, but is better to use positive value. I seem well the construction on panel data if they are the same source.
The questions related to this data are the following:
1. Do the responses of the 2 cell types differ at a particular treatment intensity? What would be the right test for this (i.e. to compare 2 groups of 6 samples)?
2. Can the responses of the 2 cell types for all intensities be described by a simple model?
3. Do the model parameters differ for the 2 cell types?
I would be grateful for your thoughts, if you are interested.
No additional information when P values from two alternative (parametric vs. nonparametric) tests is used. When there is only six experimental units then you do not have strong declaration what kind of distributions you have, if interval data set is presented. Therefore, you can use one of alternative tests. If t-test should be used the natural-logarithm transformation to normalize the distribution of data set prior to analysis should be done. U-test can be used without any preparation before analysis. But in any case do not use dichotomic statistical inferences ( significant vs insignificant). The statistical inference should be presented as "It seems to be positive", "It seems to be negative", "The judgment is suspended". The statistical inference cannot be based on fixed alpha level (see Hurlbert and Lombardi , 2009, Final collapse of the Neyman- Pearson decision theoretic framework and the rise of the neoFisherian. Annales Zoologici Fennici 46:311-349).
Ricardo: yes, you are certainly right. But suppose you want to compare the responses to one particular treatment intensity - then there are 12 units, 6 in each group. What do you think is a good approach in this situation?
I would report confidence intervals for the difference (or ratio) of means or other summaries you are interested. The intervals convey the strength of evidence you have for these comparisons.javascript:
Igor, your data is fine. I started to work it today, so in a few days I will have results. However, I need to make different asumptions to deal with the samples. So far I have observed that the first two samples have completely different medias and somewhat different distribution structures. So please, give me some time. emilio
Hello Igor, your database is small, but that does not mean you can not use a parametric test. What you should know before using a t test or ANOVA is whether your data has a normal distribution. A good statistical software that can help you to clarify many doubts is the GraphPad Prism ®. Seek to know more about it at: http://www.graphpad.com/
Igor We have only 6 data sample values. Let´s asume the following hard premises to make a rough test for the two samples, knowing that it can not be precise but it gives a proxy picture of their structures. Order data in descendig variable order. Premises are:
1) Each data has the same frequency of 1/N = 1/6; 2) Each value measured correspond to the average media of the quantil. Of course the chance of such premises is not big, but it a preliminar ideas.
2) a) The media U is the average of the six values known of each sample.
b) If you divide each quantil media by U, then you obtain its value in adimentional media units, (K@ i) and if you multiply it times frequence 1/6 you obtain the fraction of the total distributed mass for each quantil: Yi = K@ i * (1/N) where N=6. Then you add quantils to build the six points of Lorenz curve (Xac i; Li).
c) By estimating Fi=ln(Li)/ln(Xac i) fou obtain points of (Xac i; Fi) as proxy structural values for the two samples, graph them and analyse them. The lower the curve, the higher the dispersion.
d) Join conclusions from point c) with conclusions from point a), adding your own experience from the measurement process.
When the number of quantils N is higher results improve because you may compute the media of each quantil with better precisión, and frequencies get somewhat closer to real values.
It is very important the researcher experience and criteria about the minimum and maximum values of the distributed variable because it always occur that F(1)=K(1) in adimentional media units, and those points permit to fix the extreme values in some cases.
In the samples given by Igor there are negative values. This may be solved by adding the absolute value of the biggest negative variable of the two samples to make the mínimum value equal to zero. It is the same as when you convert negative temperatures in absolute temperatures by adding 273 centigrades. At the end you may transform them back to original units if you need it.
Try it with excel. The graphs obtained just for the two samples are shown in the anexed short archive.
I would report and plot the 12 observations. Why don't list them in your question, and also tell us what is the scientific problem you are studying (i.e., what did you hope to learn from these 12 numbers)?
I am sure al of data will give a certain indication and should be used but the necessary thing that the precision of these data should be high in order to give a great accuracy
Patrice, I would appreciate your opinion after presenting my results one week ago. I recall your words: "Emilio. The data are so sparse that a) the underlying distribution cannot really be inferred with confidence and b) how could analysis of so little data take a few days?" Best 2014 wishes, emilio.
The data is so small, precision issues would be addressed for the sake of accuracy and the generalization issues too need to be addressed before finalizing, but there is no odds if you want to analyze it,
Muhammad, I wellcome your comment. It is clear that the more data points you have, the more accuracy may be expected. Eventhough, the confidence problem remains. The point I want to remark is that even in this case of only six data points they portray information that may be analized as a first step for preliminar explorations of some research project in its early stages.
I would also do a power analysis to determine the power you have to detect a difference, since the datasets are so small. If you're doing a T-test comparing the means of each group for example, a small p-value may be because of the small sample size and not because of a small difference between the two groups' means.
I think that small observations isn´t significant so I recoment you to add more observation because you will be on the situation that the other colleagues has been telling.