In Chi-square test we compare observed frequency (that we measure directly from the data) with the expected frequency. We calculate expected frequency (in Chi-square for independence) for each cell. the assumption suggests that no cell should have expected frequency of less than 5.
Chi-squared test (χ2) refers to a statistical test applied to sets of categorical data to evaluate how likely it is that any observed difference between the sets arose by chance. It tests a null hypothesis stating that the frequency distribution of certain events observed in a sample is consistent with a particular theoretical distribution. In that no cell should have a frequency of less than 5. The researcher assumes the following;
1. Calculate the chi-squared test statistic, , which resembles a normalized sum of squared deviations between observed and theoretical frequencies (see below).
2. Determine the degrees of freedom, df, of that statistic, which is essentially the number of categories reduced by the number of parameters of the fitted distribution.
3. Select a desired level of confidence (significance level, p-value or alpha level) for the result of the test.
4. Compare to the critical value from the chi-squared distribution with df degrees of freedom and the selected confidence level (one-sided since the test is only one direction, i.e. is the test value greater than the critical value?), which in many cases gives a good approximation of the distribution of .
5. Accept or reject the null hypothesis that the observed frequency distribution is different from the theoretical distribution based on whether the test statistic exceeds the critical value of . If the test statistic exceeds the critical value of the null hypothesis that there is no difference between the distributions can be rejected with the selected level of confidence and the alternative hypothesis that there is a difference between the distributions can be accepted with the selected level of confidence.