Do you know of any package/algorithm/referenced threshold for item selection (for "every" test taker, not per test taker) based on IRT estimates?

10 October 2016 3 1K Report

Context : Performance test, dichotomous, one-dimensional (at least in theory), at least 3PL (variable pseudo-guessing). Theta is assumed normal. But I'm also interested in answers in general in IRT.

Problem : It seems to me that EFA factor loadings provide clear guidelines to rank/select a subset of items from a pool (with referenced rules of thumb, etc.) when one does not have any prior info/assumption of theta (aka for "all" test-takers).

On the other hand, IRT is, in my opinion, a much more accurate representation of the psychological situation that underlies test situations, but it seems to me that there are a variety (especially in IRT-3PL / 4PL) of parameters to take into account all at once to select items, at least without any prior estimation of theta.

So I'm wondering if you knew of any guidelines/packages that can be referenced as a clear basis (meaning, not eye-balling item response functions) for item selection there. At this stage I'm thinking of a very non-parsimonious solution, like generating all item subsets possible (I'd get a LOT of models, but why not) -> fit IRT model -> compute marginal reliability (and/or...Information, of why not CFI, RMSEA, etc.) for thetas ranging between -3SD and +3SD -> Rank the subsets by descending marginal reliability (but I'm afraid it would bias towards more items, so I'd have to weight per item count maybe).

Anyway, you get the idea. Any known referenced procedures/packages?

Nils Myszkowski

I'll be sure to check it out, thanks!

Daniel Wright

What often happens is that the IRT parameters, the proportion correct, and DIF statistics are used on field items and all taken into account when constructing the final test taking into account what the items are one (since often field tests will have several items for one content area but only a few for others). So, these are "eye-balling" them, but they start with criterion from psychometricians but the decisions are often by the test developers. This is for large scale tests.

As far as packages, I'm not sure exactly what you want (if it is just choose your thresholds for various statistics, that could be coded fairly easily), but a big list of IRT packages is on:

https://cran.rstudio.com/web/views/Psychometrics.html

Nils Myszkowski

Thanks for your answer! I'm using `mirt` actually.

I agree that eye-balling works but i was thinking of some form of algorithm/reference to maximize information/reliability for a (theoretical) distribution of test takers (e.g. normal) rather than for one test taker with an given (or estimated) theta. All of that with a fixed test length.

Say for example, you have 100 items (with data), and want the subset of 10 items that maximizes test information for test takers with a normal (0,1) theta distribution (or any other distribution for that matter).

Is it possible to plot the atom-projected band structure using GPAW?

Should I include H atom into C3N5 when i am doing DFT modelling?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Are there any good simple systems or platforms to recommend?

How does one derive the standard deviation of a scale?

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Why do exism movements become permanent dictatorship threats within liberal democracy thinking under majority rule-independent rule of law system?

How to report results of Generalised Linear Mixed Models in a journal article?

How to use Density Functional Theory to calculate carrier mobilities of solid system?

Should I remove an item from a scale to raise Cronbach's alpha and McDonald's omega or is it better to leave it if they are both over .7 already?