There is a "Fisher exact test" for a general r x c contingency table. Compare the observed value of the usual chi-square statistic not to an asymptotic chi-square distribution but to the exact permutation distribution of the chi-square statistic. That means: you allocate your N observations to the r.c cells in all possible ways such that the row and column margins are constant and equal to those actually observed. However, usually there are far too many permutations to enumerate explicitly. Instead we take a large random sample of permutations. For instance, this can be done using the R package "coin". See http://www.statmethods.net/stats/resampling.html, in particular the section
Independence in Contingency Tables
# Independence in 2-way Contingency Table based on
# 9999 Monte-Carlo resamplings. A and B are factors.
I am trying to find a permutation test that works on a general nxm table. The data set is small enough to have cells with too small counts to make chi2-approximation invalid. If the table was a 2x2 contingency table I would like to use a Fisher exact test (fisher.test) but that won't work in this general table. Does there exist a general function for this test?
Here are two answers:
Søren Højsgaard wrote:
Using r2dtable() you can simulate general tables nxm with given margins. Based on these you acn calculate a Monte Carlo p-value for a conditional test for independence.
Peter Dalgaard wrote:
fisher.test(...., simulate.p.value=TRUE) might be more direct. Also
works for chisq.test().
And, contrary to popular belief, fisher.test() does work for larger
than2x2 tables, although you may run into space/time limitations.
Optimal data analysis is ideal for small samples, and provides alternatives for chi-square. See APA book with software: Yarnold and Solystik (2005). Optimal data analysis. American Psychological Association. Dr. Paul Yarnold also has a journal and blog at http://odajournal.com/.
A 3x3 table often tests a rather vague hypothesis - "there is some kind of relationship between…". Are either of your variables ordered? If so, then the analysis you propose does not take this into account. And if both variables are categorical, do you have a more precise hypothesis?
For example, if one variable is smoking, coded as -Never, -Ex and -Current, then you can think of this as two binary variables: whether the person ever smoked and whether they smoke now. Likewise, marital status can often be rewritten as two binary variables: whether the person ever married, and whether they are still married.
It's useful to think about variables with more than two categories to see if you can find a simpler underlying structure of binary variables that will allow you to specify and test more precise hypotheses.
Using Exact or Monte carlo methods as shown in other posts. These methods will overcome any problems with assumptions in chi squared independence test and give accurate estimates of significance.
If using SPSS and the Exact modiule installed you can easily compute the answer..
(or StatExact).
Maybe ok using the simpler chisquared test in most cases!
Note even thought you have some values with expected values < 5 it is still likely that your result using Asytopic values shown in chi squared will be ok. It depends on how many cells have expected values below 2 or so.
The exact test is the best but is really best to use if the marginal counts are small.
@Ronan: good point. If the amount of data is rather small there is little point in performing an omnibus test for independence. It has a tiny amount of power against every conceivable alternative. Probably some alternatives are more plausible/interesting/important than others. So indeed one should think of replacing the chi-squared statistic with something focussed on particular kinds of alternatives. This still leaves you free to use the permutation approach to determine significance.
Fisher's exact test is not "exact" in the sense of a permutation test, or enumeration. In a sense, it is a misnomer. For comparison of the proportions of success in two groups, there are two unknown parameters, namely the two success probabilities. This can be re-parametrized into the difference and one success probability, which is a nuisance parameter. Conditioning on the estimate of this nuisance parameter (and essentially fixing the marginals) results in Fisher's exact test, based on the hypergeometric distribution. For r by c, see Mehta and Patel (JASA, 1983) and www.cytel.com.
So Fisher's exact test is an exact test in the same sense that a permutation test is exact. Moreover it's the permutation test based on all permutations leaving the margins fixed. Under the null hypothesis and conditional on the margins, the distribution of the data is uniform over all feasible permutations. Let's call it an exact conditional permutation test.
@Brian Altonen: you are saying that by multiplying the observed cell counts by 2, 3, 4, ... the chi-square p-value will decrease then plateau. However multiplying all cell counts by N multiplies the statistic by N and takes you further and further into the tail of a fixed chi-square distribution as N increases without limit, if you are referring you statistic to its large sample approximate null distribution. So the phenomenon whih you observe must be a small sample phenomenon due to the discreteness and conservatism of the Fisher exact test. And whether the p-values you see when you have artificially inflated the sample size by some arbitrary factor N=2 or 3 or 4 have any statistical meaning at all, is not clear to me.
Everyone is familiar with the usual rule of thumb "asymptotic chi-square approximation is adequate if expected number of observations per cell is at least 5". I recall work by Albert Verbeek which showed that one can be much liberal. As long as the number of cells with small expected cell count is small compared to the rest, or something like that, things are not so bad. Unfortunately it seems this work never got published. I need to ask his former collaborators what came of this work.
Thanks so much for all your detail explanation. Actually, I performed the 3X3 table to test between-group difference at baseline in an RCT (3 gp, 3 cat.) using SPSS. Prof. Gill mentioned that Fisher's exact test works in table larer than 2X2, but SPSS just can't do that. Is there any package of SPSS for doing that? or I can just go with Chi-sqaure? Thanks
@Rocio Hassan: thanks for the link, interesting site, interesting methods. I saw the rule of thumb I was looking for: "at least 80% of the cells have an expected frequency of 5 or greater, and that no cell has an expected frequency smaller than 1.0".
The usual rule of thumb "all expected frequencies at least 5" is unnecessarily strict.
Sorry i am clear with some of the things but can i look for the chi square test between different generic drugs which has 3 different generics with promotional activities which has 10 different promotions. It shows more than 70% cellls less than 5. Please how can i solve and how to interprete it.
Hi Wing, you might find this paper useful too"Using Lancaster's Mid-P correction to the Fisher's exact test for the adverse impact analyse" Dan A. Biddle J. Appl Psychol 2011, Vol. 96, No. 5 956-965
Fisher-Freeman-Halton test, an extension of the Fisher exact can be applied for contingency tables that are not 2x2. This link provides a way to do it in SPSS.
this is exactly the question I have. I've been searching internet extensively and what I've found is kind of conflicted. Some suggest the original Fisher's exact test( which is the one spss calculates through crosstabs) can be extended to r*c tables and the reason it was not originally inclined by Fisher himself, is that computation becomes difficult to near impossible by hand, but computers are capable of doing it.
some others suggest some kind of extension just like what has been mentioned above. I can not decide which one is correct or at least better to adhere to? some extensions like Freeman-Halton are definitely alien to many researchers and readers of medical journals at least.
Your question is very much interesting and it is a wrong one also. If your data (2 x 3) table is having > 5 in all the cells then you have to find the association between the variables by using Chi-Square test only.
If your data (2 x 3) table is having 5 5 in any one cells then you have to go and find the association between the variables by using Fisher's Exact test method.
There is no formula for find the sample size needed of 2 x 3 Fisher's Exact Test.
I have a small dataset (N=400 samples). And I want to apply Fisher exact test because I my contingency table has a lot of 0s but I have read that Fisher Exact test is only for N≤90 so I returned to Chi2-test but I read that the chi-square test is performed only if at least 80% of the cells have an expected frequency of 5 or greater, and no cell has an expected frequency smaller than 1.0, which is not my case.
Dear jaume, I believe the best course of action is collapsing your table into fewer rows and tables. In many medical scenarios at least, collapsing the table is sensible and logical.
Please have a look at my recent paper on this issue. It provides a numerical answer to question posed here. Article :. The Tale of Cochran's Rule: My Contingency Table has so Many Expected Values Smaller than 5, What Am I to Do? https://amstat.tandfonline.com/doi/abs/10.1080/00031305.2017.1286260
Abstract: In an informal way, some dilemmas in connection with hypothesis testing in contingency tables are discussed. The body of the paper concerns the numerical evaluation of Cochran's Rule about the minimum expected value in r×c contingency tables with fixed margins when testing independence with Pearson's X2 statistic using the chi-squared distribution.
Both pearsons chisquare test and fisher exact test is a non -parametric test look for the associations dichotomous categorical varaible.However the general statistical rule of thumb is for chi-square test,in a 2x2 contingency table atleast 5 observations.
But if one of the observations in 2x2 contigency table is less than 5,then you must go for fisher exact test.
As it can be seen from your image attached,it has been clearly mentioned as 2 cells have expected count less than 5.Then in that case you must go for fisher exact test.
Hello everyone, I am also having similar issue. Please can I get example code in python or a package to compute fisher's exact for more than 2x2 contingency table. Attached is my the table!