Hi,
I am currently looking into local (in)depence statistics for IRT models and I am confused about several handle degrees of freedom. I am dealing with binary data (items with K=2 categories) so my questions pertain to this case, in which responses to two items (i and j) are used to identify misfit.
1. I believe the most used statistic is the unadjusted Chi square value by Chen & Thissen (1997). Am I correct that this statistic should be referenced against a Chi square distribution with 1 df? This would make the critical value 3.84 (at p = .05). However, many papers claim that this distribution does not hold empirically. What should then be the right critical value? Or should this measure be avoided?
2. The standardized LD statistic is calculated by applying formula 6.22 (see attached, from Maydeu-Olivares, 2015 in Reise & Revicki), and should be normally distributed (?). But, if the objections listed above hold, then they should also apply to this measure, because it is merely a transformation of the unadjusted Chi square statistic, right? Is there a heuristic critical value we can use in practice for this statistic?
3. Interestingly, Drasgow et al. (1995) appear to apply the same procedure as the above measures (based on 2x2 table of observed and expected freqs). However, they advocate df = Ki - Kj - 1, which in the binary case would result in df = 2 x 2 - 1 = 3. Should this then be the number of df against which the unadjusted Chi square should be evaluated?
My problem is that I have the unadjusted Chi square values for a large set of items, but now I do not know how to assess whether they are large or small, because I do not know the reference distribution/critical values.
So practical recommendations would be much of help!
Dirk
https://conservancy.umn.edu/bitstream/handle/11299/117478/v19n2p143.pdf
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.820.3125&rep=rep1&type=pdf
Article Identifying the Source of Misfit in Item Response Theory Models