I carried out a chi square test and the Pearson's chi-square score is greater than the critical value, which means I can reject the null hypothesis. But how do I determine where this significance lies? Please help.
The chi² statistic is the sum of (observed-expected)²/expected for each cell. You can plot a heatmap of these values. This gives you a good impression which cells contribute most to the chi² value.
A slightly modified approach to the one Jochen Wilhelm describes is to use the adjusted standardized residuals (ASR) from the analysis. These are based on the calculation for (observed - expected)/sqrt(expected), but they are adjusted for the row and column totals. You can find the formula for these easily online † , and software packages often produce them.**
The advantage here is that the scale of the results is similar to that of z scores, and so relatively easy to interpret. That is, an ASR of > 1.96 or < -1.96 suggests that the cell is contributing to the effect. And an ASR of > 2.58 or < -2.58 suggests that the cell is contributing to the effect more strongly. You might think of these as analogous to z-scores, so that these two levels are analogous to a alpha level of 0.05 and 0.01 respectively.
† e.g. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=1269&context=pare
** Some software may report the unadjusted standardized residuals, (observed - expected)/sqrt(expected). It's not always obvious from software documentation if the reported values are the adjusted or unadjusted standardized residuals. EDIT: From the discussion below, SPSS reports the unadjusted standardized residuals by default, but you can request the adjusted.
Sal Mangiafico how to do I interpret this? the pearson's score shows significance (0.018) but the standardized residual value for blood group b is less than 1.96/2. What does this mean?
You don't need to divide that z-like value by 2. For alpha = 0.05, z for alpha/2 = 1.96.
EDIT: From the discussion below, SPSS reports the unadjusted standardized residuals by default, but you can request the adjusted.
EDIT: If you want to use the unadjusted standardized residuals: In this case, there are a couple of options. You could go with a cut-off analogous to an alpha of 0.10 for this approach. z for alpha=0.10/2 is 1.65. Only yes-b would meet this criterion. ... What I would probably do though: You might use no specific criterion, and just note that it's b and o that have relatively large standardized residuals (say, > 1, or ≥ 1.3). (a z of 1.3 corresponds to an alpha/2 of less than 0.2). It's really these four cells that are driving the significant difference in counts from the expected. ... Also notice the sign of the residuals: The count for b-yes is less than expected, while the count for o-yes is more than expected.
Okay, so the results included in the results by Mika Mika are the unadjusted standardized residuals. If you calculate the adjusted standardized residuals, (as per the paper linked in my previous response) the results will be higher in value. Specifically | those for b | will be > 2.58 and | those for o | will be > 1.96. EDIT: Below Bruce Weaver has SPSS code to request the adjusted standardized residuals.
The details about how SPSS computes the various residuals can be found in the Algorithms manual. Go to the page linked below and do a Ctrl-F search for to find the PDF.
https://www.ibm.com/support/pages/node/874712
The relevant portion of the documentation is shown on the second attached png file.
Thanks, Bruce Weaver . Those are the results I am getting with R and SAS, which apparently use the same calculation. The free PSPP software I was using was giving totally different results. I've used that software like twice, and it always gives me guff. EDIT: I did submit a bug report to them. I've never used SPSS much, but I do support the effort to produce a free product which mimics the basic analyses in SPSS.
Sal Mangiafico why did you divide alpha by 2? The residuals are standardized. Should I check the "adjusted" option in the residuals section? Bruce Weaver
Hi, Mika Mika ... Yes, check Adjusted Standardized, and see if that returns the same values that Bruce Weaver included (e.g. ASR for b are -2.8 and 2.8). ... You divide alpha by two because you are conducting something analogous to a two sided hypothesis test. That is, before you collected the data, you didn't know that b-yes would be lower than expected and that b-no would be higher than expected. So, by analogy with a hypothesis test, for alpha = 0.05, you are comparing to a 0.025 probability of getting a value that extreme on the high end and a 0.025 probability of getting a value that extreme on the low end. (Both with under the assumption that the null hypothesis is true.) This figure may help: https://www.jahjournal.org/viewimage.asp?img=JApplHematol_2014_5_1_27_131823_u1.jpg