Application of Rasch analysis and IRT models are becoming increasingly popular for developing and validating a patient reported outcome measure. Rasch analysis is a confirmatory model where the data has to meet the Rasch model requirement to form a valid measurement scale. Whereas, IRT models are exploratory models aiming to describe the variance in the data. Researchers seem to be divided on the preference of one over another. What is your opinion about this dilemma, in development of patient reported outcome measures?
Rasch requires the data to fit the model in order to generate invariant, interval-level measures (sic.) of items and persons. It is prescriptive. IRT models attempt to great a model that will fit the data. They are descriptive. While IRT users, see Rasch as a particular IRT model, most Rasch proponents see it as distinctly different from other IRT models. The key differences are philosophical. Wiki provides a suitable introduction:
https://en.wikipedia.org/wiki/Rasch_model
You might recall Fan's infamous comparison paper. I can add a critique of that if you wish.
Rasch models are a special case of IRT models in which one presumes a single item parameter (difficulty or location) to be required, and all item discriminations (the second parameter in a two-parameter IRT model) to be equal to 1, and no possibility of guessing behavior (the third parameter in a three-parameter IRT). There may be some hard-core Rasch specialists who assert other differences, but the models are otherwise the same. I'm not at all sure that I would agree that IRT is exploratory whereas Rasch is confirmatory. However, it is true that a one-parameter model makes certain operations and claims about a scale much simpler than two- or three-parameter models.
It's really a matter of, how many (or how few) item parameters are needed to capture the behavior of items/stimuli and respondents on a common scale in a dependable manner? Questions of scale dimensionality are also salient in this type of investigation or scale building.
It is a little more complicated then you image. I am analyzing 3 sets of about 2000 and have a program in SPSS/R inside that tests Rasch, Factoranalysis, Reliability, Graded Response, 2PL and 3PL models, Homals and Mokken in one run and it is all exploratory. I have about 10 other independent sets to test or expand my exploratory analysis. (In total 35000+ cases). May aim it to find the best model to belong to the data. Each result tells me more about the items that belong in a few sets of scales.I would prefer multicategorical Rasch to multicategorical Mokken to dichotmous Rasch to dichotomous Mokken. But the data will set me straight. In a second stage I will also investigate DIF. This can be done with either Rasch or Mokken.
@David, the second parameters does not need to be equal to 1, they just need to be equal. Sometimes the Rasch model does not fit with a discrimination parameter of 1, but might fit with 1.7. CITO (the Netherlands) has/had a program (OPSUG) to see if another value might be better.
I agree with David Morse. It is not true to my understanding that one is confirmatory and another is exploratory rather the parameter. Rasch or IRT will tell you that a scale is performing in a according to certain parameters and I believe some say that this is preferable in some ways I.e. for complex concepts such as quality of life/ ability to do things etc. For symptoms to me - it makes less sense as they would not necessarily line up together and maybe distant a cough may be quite different from the sensation of pain. Personally I don't feel that your scale does need to always follow the principles of Rasch or IRT. To me Rasch is just a nice to have but if it does not work I do not sweat about it too much and as long as the results are interpretable - it maybe a very good instrument still.
It is like music you know there was a music teacher in school who challenged us to share/ defend our taste that we would bring in music and he would just disregard anything that wasn't complicated or classical. I feel sometimes we make PRO development too complicated and Rasch/ IRT is where I draw the line. Little nice tunes - can still create a perfect sound. :-)
Rasch requires the data to fit the model in order to generate invariant, interval-level measures (sic.) of items and persons. It is prescriptive. IRT models attempt to great a model that will fit the data. They are descriptive. While IRT users, see Rasch as a particular IRT model, most Rasch proponents see it as distinctly different from other IRT models. The key differences are philosophical. Wiki provides a suitable introduction:
https://en.wikipedia.org/wiki/Rasch_model
You might recall Fan's infamous comparison paper. I can add a critique of that if you wish.
To Trevor Bond's point, Rasch is the only social science paradigm for psychometrics that has support in the metrology community for measurement in the physical, chemical and biological sciences. This is because Rasch approaches measurement from a mathematical analogy with the same approach in the rest of the sciences, rather than from a statistical modeling approach. Collaborations between metrologists and Rasch enthusiasts (and not IRT) include
Article A gentle introduction to Rasch measurement models for metrologists
Article Man as a Measurement Instrument
Conference Paper Metrology of human-based measurements
Article On Trial: the Compatibility of Measurement in the Physical a...
Article A meta-structural understanding of measurement
Article Quantities, Quantification, and the Necessary and Sufficient...
I refer to the book entitled as: Rasch analysis in the human sciences
(Boone, W. J., Staver, J. R., & Yale, M. S. (2013). Rasch analysis in the human sciences. Springer Science & Business Media).
in page 449,It was written:
Isabelle and Ted: Two Colleagues Conversing
Ted : Isabelle , I need your help here . I am looking at a number of articles that have used Rasch to analyze data . Sometimes the authors use the term “ Rasch analysis ,” and sometimes they use the term “ IRT ” or “ Item Response Theory .” Are those words interchangeable ? Also , there is another thing ; I noticed that sometimes people write about the Rasch
model as being the 1 - parameter model , and in the same breath , they write about the 2 - parameter model and the 3 - parameter model . What is going on ?
Isabelle : You know Ted , I wrestled with the same issue when I first started my work . It took me a while to sort things out , and now I understand the differences , but it would have been a lot easier if someone had taken me aside and explained the issues .
Additionally, "Our goal in this chapter is to help readers understand that Rasch models, in our minds, are substantially different in many ways from Item Response Theory (IRT) models". (page, 449)
in page 453, it was written:
Rasch and IRT: Philosophical Difference,Rasch measurement is often classified under the umbrella of Item Response Theory (IRT) models. However, a core philosophical difference exists between the Rasch
model and the IRT models (often referred to as the 1-parameter, 2-parameter, or 3-parameter models). Whereas the IRT models are altered (more parameters added) to fit the data, the Rasch measurement model is not altered to fit the data and is thus viewed as a definition of measurement. Examination of the 1-parameter IRT model reveals that it looks identical to the Rasch model. Consequently, some researchers refer to the Rasch model as the 1-P model or as the 1-P IRT Rasch model. We view such references as mistakes because of the immense philosophical difference, in that one model, IRT, is altered to fit data and one model, Rasch, is not altered to fit data. Therefore, Rasch is the model that is consistent with the definition of measurement as set forth by Thurstone over 80 years ago.
I agree with Katja Rudell and Prof. Bond, In Rasch measurement model, data are fitted to model, with some requirements, the main requirement of Rasch measurement model is unidimensionality (tested by Rasch -PCA of residuals).
I will be grateful to hear more answers about this question.
The Rasch modelling fit has been quite remarkable ( I agree) and it is important to remind ourselves this came from educational science where the purpose has been to make a multidimensional ability test linear and less arbitrary. Whilst I absolutely applaud the sentiment all I am concerned with is that it is quite plausible if not likely that our bodies, symptoms etc are nothing like that hence not all things will pass the Rasch test.
Rasch will not make multidimensional ability test linear. It requires that we measure one thing at a time, as we do in the physical sciences.
if you have a strong theory about what you want to measure, Rasch modelling will put your instantiation of that to the empirical test. You might find out that your supposed many dimensions are merely artifacts; you might find out that some of your good testing ideas are not sufficiently related to your key ideas to be counted as one dimension.
Of course, human attributes are sophisticated and have many aspects, but Rasch Measurement helps us to identify and measure just one attribute at a time.
Thanks for adding that reply. Trevor would you remove items from an out come scale when the results show a bad result. One at a time Is useful but not sure how useful in the pro development space?
That is my only concern. Some believe that it is necessary whereas others do not. Then we get lots of advocates one way or another. I have not seen evidence that settles this or consensus statements. Or can you guide me to some?
items that don’t fit the Rasch model’s damage person measures. The usual practice is put those items aside and reanalyse. The higher the testing stakes, the more important it is to remove those items.
The special properties of the Radch model apply only to extent that the data fit the model.
Trevor, I hear what you are saying and I have heard this before. Can you name a guideline or source for me? I don't want to say because Trevor says on research gate. Thank you!
Katja - Trevor Bond & Christine Fox's book is really the best introductory book out there https://www.routledge.com/Applying-the-Rasch-Model-Fundamental-Measurement-in-the-Human-Sciences/Bond-Fox/p/book/9780415833424
"The first official detailed investigation of the validity of psychological measurement from beyond its professional ranks was conducted – under the auspices of the British Association for the Advancement of Science – by the Ferguson Committee in 1932. The non-psychologists on the committee concluded that there was no evidence to suggest that psychological methods measured anything, as the additivity of psychological attributes had not been demonstrated..." by Dr Hugh Morrison
Above conclusion is also true for IRT (also for Rasch model) because IRT and Rasch model have no their additive structures in their theoretical design. Therefore, IRT/Rasch model is incorrect theory for high stake scoring.
You need to read what has happened in the last 85 years. Georg Rasch designed his model specially for unidimensional, linear additivety of interval measurement units.
you can read Joel Michell’s critique of the outcomes of the Ferguson committee.
collegially
TGB
ps Matt Barney provided a good reading list for you, above.
"When in 1940, a committee established by the British Association for the Advancement of Science to consider and report upon the possibility of quantitative estimates of sensory events published its final report (Ferguson eta/., 1940) in which its non-psychologist members agreed that psychophysical methods did not constitute scientific measurement, many quantitative psychologists realized that the problem could not be ignored any longer. Once again, the fundamental criticism was that the additivity of psychological attributes had not been displayed and, so, there was no evidence to support the hypothesis that psychophysical methods measured anything. While the argument sustaining this critique was largely framed within N. R. Campbell's (1920, 1928) theory of measurement, it stemmed from essentially the same source as the quantity objection." by Joel Michell
Thank you Professor @Trevor G Bond for your insights here. I enjoyed reading your book " Applying the Rasch Model: Fundamental Measurement in the Human Sciences, 3rd edition.” from beginning to the end.
I also read Fan's paper:Article Item Response Theory and Classical Test Theory: An Empirical...
I have a question on the scaling and IRT. Personally I am a fan of the metric system as I was brought up on it and we often use it to make judgement calls. Do the curve change when you offer 0-10 versus a 0-6? So if you vary the response scale in the testing phase?
there are an other point about Rash model or logistic model with one-parameter. This model was defined by Rash in 1960 and it only uses the difficulty parameter for computing the probability to response correctly an evaluative item form the test. There are other focus about that, it includes other parameters, such as: discrimination and guessing, namened as Birnbaum and Fred Lord too
Wow, "Once again, the fundamental criticism was that the additivity of psychological attributes had not been displayed and, so, there was no evidence to support the hypothesis that psychophysical methods measured anything." Well, I understand the basic math of the Rasch model, and I believe that it is "necessary and sufficient for measurement". It is so funny that psychological measurement is compared to physical measurement with the suggestion that one is measurement and one is not. So, let's go back to a time when thermometers did not exist... temperature was determined by "self-report"... this is hot and this is cold. So are you saying that temperature could not be measured at that time? How about the concept of "three-dog-night"; the number of dogs that it takes to stay warm on a cold night... is it measurement yet? When did tempurature achieve the status of measurement? Did we have to wait for the thermometer to be developed before temperature became qualified to be considered measurement? People are confusing measurement with precision. The Rasch model works well measuring psychological phenomena. The concept of the probability of passing/failing = person ability - item difficulty with the 50% probability as a good index of measurement that can be replicated for the physical sciences. For example back in the 1960's world record milers were trying to break the 4 minute mile. For those who were at that level of ability, sometimes they could break the record and sometimes they could not (.5 probability of passing). Ask these individuals to run a 3 minute mile... too difficult. Ask them to run a 10 minute mile... too easy. So, the Rasch model (and all other IRT) models would come to the same conclusion. So, if the model works for physical function, shouldn't it work for psychological phenomena? Being tearful every day probably is reflective of someone very depressed. Feeling a "little down" is probably reflective of someone who is very mildly depressed. The Rasch model will clearly demonstrate this pattern. So, it works for physical measurement; it also works for psychological measurement. So you still want to question whether "...psychophysical measures measure anything?"