Rasch vs IRT? - FAQS.TIPS

02 February 2018 28 584 Report

Application of Rasch analysis and IRT models are becoming increasingly popular for developing and validating a patient reported outcome measure. Rasch analysis is a confirmatory model where the data has to meet the Rasch model requirement to form a valid measurement scale. Whereas, IRT models are exploratory models aiming to describe the variance in the data. Researchers seem to be divided on the preference of one over another. What is your opinion about this dilemma, in development of patient reported outcome measures?

Trevor G Bond Popular answer

Rasch requires the data to fit the model in order to generate invariant, interval-level measures (sic.) of items and persons. It is prescriptive. IRT models attempt to great a model that will fit the data. They are descriptive. While IRT users, see Rasch as a particular IRT model, most Rasch proponents see it as distinctly different from other IRT models. The key differences are philosophical. Wiki provides a suitable introduction:

https://en.wikipedia.org/wiki/Rasch_model

You might recall Fan's infamous comparison paper. I can add a critique of that if you wish.

David Morse

Hello HImal,

Rasch models are a special case of IRT models in which one presumes a single item parameter (difficulty or location) to be required, and all item discriminations (the second parameter in a two-parameter IRT model) to be equal to 1, and no possibility of guessing behavior (the third parameter in a three-parameter IRT). There may be some hard-core Rasch specialists who assert other differences, but the models are otherwise the same. I'm not at all sure that I would agree that IRT is exploratory whereas Rasch is confirmatory. However, it is true that a one-parameter model makes certain operations and claims about a scale much simpler than two- or three-parameter models.

It's really a matter of, how many (or how few) item parameters are needed to capture the behavior of items/stimuli and respondents on a common scale in a dependable manner? Questions of scale dimensionality are also salient in this type of investigation or scale building.

Good luck with your work!

Peter Moorer

It is a little more complicated then you image. I am analyzing 3 sets of about 2000 and have a program in SPSS/R inside that tests Rasch, Factoranalysis, Reliability, Graded Response, 2PL and 3PL models, Homals and Mokken in one run and it is all exploratory. I have about 10 other independent sets to test or expand my exploratory analysis. (In total 35000+ cases). May aim it to find the best model to belong to the data. Each result tells me more about the items that belong in a few sets of scales.I would prefer multicategorical Rasch to multicategorical Mokken to dichotmous Rasch to dichotomous Mokken. But the data will set me straight. In a second stage I will also investigate DIF. This can be done with either Rasch or Mokken.

@David, the second parameters does not need to be equal to 1, they just need to be equal. Sometimes the Rasch model does not fit with a discrimination parameter of 1, but might fit with 1.7. CITO (the Netherlands) has/had a program (OPSUG) to see if another value might be better.

Katja Rudell

I agree with David Morse. It is not true to my understanding that one is confirmatory and another is exploratory rather the parameter. Rasch or IRT will tell you that a scale is performing in a according to certain parameters and I believe some say that this is preferable in some ways I.e. for complex concepts such as quality of life/ ability to do things etc. For symptoms to me - it makes less sense as they would not necessarily line up together and maybe distant a cough may be quite different from the sensation of pain. Personally I don't feel that your scale does need to always follow the principles of Rasch or IRT. To me Rasch is just a nice to have but if it does not work I do not sweat about it too much and as long as the results are interpretable - it maybe a very good instrument still.

It is like music you know there was a music teacher in school who challenged us to share/ defend our taste that we would bring in music and he would just disregard anything that wasn't complicated or classical. I feel sometimes we make PRO development too complicated and Rasch/ IRT is where I draw the line. Little nice tunes - can still create a perfect sound. :-)

Trevor G Bond

https://en.wikipedia.org/wiki/Rasch_model

You might recall Fan's infamous comparison paper. I can add a critique of that if you wish.

Trevor G Bond

A simple Google Scholar search: "HRQol Rasch", yields

https://scholar.google.com.au/scholar?hl=en&as_sdt=0%2C5&q=HRQoL+Rasch&oq=HRQol

The way Rasch measurement has influenced the field is quite remarkable.

Matt Barney

To Trevor Bond's point, Rasch is the only social science paradigm for psychometrics that has support in the metrology community for measurement in the physical, chemical and biological sciences. This is because Rasch approaches measurement from a mathematical analogy with the same approach in the rest of the sciences, rather than from a statistical modeling approach. Collaborations between metrologists and Rasch enthusiasts (and not IRT) include

Article A gentle introduction to Rasch measurement models for metrologists

Article Man as a Measurement Instrument

Conference Paper Metrology of human-based measurements

Article On Trial: the Compatibility of Measurement in the Physical a...

Article A meta-structural understanding of measurement

Article Quantities, Quantification, and the Necessary and Sufficient...

Fereshteh Zeynivandnezhad

Dear Himal,

I refer to the book entitled as: Rasch analysis in the human sciences

(Boone, W. J., Staver, J. R., & Yale, M. S. (2013). Rasch analysis in the human sciences. Springer Science & Business Media).

in page 449,It was written:

Isabelle and Ted: Two Colleagues Conversing

Ted : Isabelle , I need your help here . I am looking at a number of articles that have used Rasch to analyze data . Sometimes the authors use the term “ Rasch analysis ,” and sometimes they use the term “ IRT ” or “ Item Response Theory .” Are those words interchangeable ? Also , there is another thing ; I noticed that sometimes people write about the Rasch

model as being the 1 - parameter model , and in the same breath , they write about the 2 - parameter model and the 3 - parameter model . What is going on ?

Isabelle : You know Ted , I wrestled with the same issue when I first started my work . It took me a while to sort things out , and now I understand the differences , but it would have been a lot easier if someone had taken me aside and explained the issues .

Additionally, "Our goal in this chapter is to help readers understand that Rasch models, in our minds, are substantially different in many ways from Item Response Theory (IRT) models". (page, 449)

in page 453, it was written:

Rasch and IRT: Philosophical Difference,Rasch measurement is often classified under the umbrella of Item Response Theory (IRT) models. However, a core philosophical difference exists between the Rasch

model and the IRT models (often referred to as the 1-parameter, 2-parameter, or 3-parameter models). Whereas the IRT models are altered (more parameters added) to fit the data, the Rasch measurement model is not altered to fit the data and is thus viewed as a definition of measurement. Examination of the 1-parameter IRT model reveals that it looks identical to the Rasch model. Consequently, some researchers refer to the Rasch model as the 1-P model or as the 1-P IRT Rasch model. We view such references as mistakes because of the immense philosophical difference, in that one model, IRT, is altered to fit data and one model, Rasch, is not altered to fit data. Therefore, Rasch is the model that is consistent with the definition of measurement as set forth by Thurstone over 80 years ago.

I agree with Katja Rudell and Prof. Bond, In Rasch measurement model, data are fitted to model, with some requirements, the main requirement of Rasch measurement model is unidimensionality (tested by Rasch -PCA of residuals).

I will be grateful to hear more answers about this question.

Best,

Trevor G Bond

Model /data fit is inadequate:

IRT= add another parameter

Rasch= what went wrong in my attempt to measure?

Fereshteh Zeynivandnezhad

So many thanks Prof. Bond for clarification.

Katja Rudell

The Rasch modelling fit has been quite remarkable ( I agree) and it is important to remind ourselves this came from educational science where the purpose has been to make a multidimensional ability test linear and less arbitrary. Whilst I absolutely applaud the sentiment all I am concerned with is that it is quite plausible if not likely that our bodies, symptoms etc are nothing like that hence not all things will pass the Rasch test.

Trevor G Bond

Katja,

Rasch will not make multidimensional ability test linear. It requires that we measure one thing at a time, as we do in the physical sciences.

if you have a strong theory about what you want to measure, Rasch modelling will put your instantiation of that to the empirical test. You might find out that your supposed many dimensions are merely artifacts; you might find out that some of your good testing ideas are not sufficiently related to your key ideas to be counted as one dimension.

Of course, human attributes are sophisticated and have many aspects, but Rasch Measurement helps us to identify and measure just one attribute at a time.

Katja Rudell

Thanks for adding that reply. Trevor would you remove items from an out come scale when the results show a bad result. One at a time Is useful but not sure how useful in the pro development space?

That is my only concern. Some believe that it is necessary whereas others do not. Then we get lots of advocates one way or another. I have not seen evidence that settles this or consensus statements. Or can you guide me to some?

Trevor G Bond

Katja,

items that don’t fit the Rasch model’s damage person measures. The usual practice is put those items aside and reanalyse. The higher the testing stakes, the more important it is to remove those items.

The special properties of the Radch model apply only to extent that the data fit the model.

Katja Rudell

Trevor, I hear what you are saying and I have heard this before. Can you name a guideline or source for me? I don't want to say because Trevor says on research gate. Thank you!

Trevor G Bond

Dear Katja

i wrote a book on the topic

Matt Barney

Katja - Trevor Bond & Christine Fox's book is really the best introductory book out there https://www.routledge.com/Applying-the-Rasch-Model-Fundamental-Measurement-in-the-Human-Sciences/Bond-Fox/p/book/9780415833424

Katja Rudell

Thank you!

Nan Kong

"The first official detailed investigation of the validity of psychological measurement from beyond its professional ranks was conducted – under the auspices of the British Association for the Advancement of Science – by the Ferguson Committee in 1932. The non-psychologists on the committee concluded that there was no evidence to suggest that psychological methods measured anything, as the additivity of psychological attributes had not been demonstrated..." by Dr Hugh Morrison

https://paceni.wordpress.com/tag/holders-seven-axioms/

Above conclusion is also true for IRT (also for Rasch model) because IRT and Rasch model have no their additive structures in their theoretical design. Therefore, IRT/Rasch model is incorrect theory for high stake scoring.

Trevor G Bond

Nan Kong,

You need to read what has happened in the last 85 years. Georg Rasch designed his model specially for unidimensional, linear additivety of interval measurement units.

you can read Joel Michell’s critique of the outcomes of the Ferguson committee.

collegially

TGB

ps Matt Barney provided a good reading list for you, above.

Nan Kong

Trevor G Bond ,

"When in 1940, a committee established by the British Association for the Advancement of Science to consider and report upon the possibility of quantitative estimates of sensory events published its final report (Ferguson eta/., 1940) in which its non-psychologist members agreed that psychophysical methods did not constitute scientific measurement, many quantitative psychologists realized that the problem could not be ignored any longer. Once again, the fundamental criticism was that the additivity of psychological attributes had not been displayed and, so, there was no evidence to support the hypothesis that psychophysical methods measured anything. While the argument sustaining this critique was largely framed within N. R. Campbell's (1920, 1928) theory of measurement, it stemmed from essentially the same source as the quantity objection." by Joel Michell

Nan Kong

Please see this paper (IRT SCORING AND THE PRINCIPLE OF CONSISTENT ORDER):

https://arxiv.org/pdf/1805.00874.pdf

Himal Kandel

Thank you Professor @Trevor G Bond for your insights here. I enjoyed reading your book " Applying the Rasch Model: Fundamental Measurement in the Human Sciences, 3rd edition.” from beginning to the end.

I also read Fan's paper:Article Item Response Theory and Classical Test Theory: An Empirical...

Yes, your critique on this would be interesting.

Many thanks!

Trevor G Bond

Pls look at pp306-8

Katja Rudell

I have a question on the scaling and IRT. Personally I am a fan of the metric system as I was brought up on it and we often use it to make judgement calls. Do the curve change when you offer 0-10 versus a 0-6? So if you vary the response scale in the testing phase?

Orlando Grabiel Toledano Lopez

there are an other point about Rash model or logistic model with one-parameter. This model was defined by Rash in 1960 and it only uses the difficulty parameter for computing the probability to response correctly an evaluative item form the test. There are other focus about that, it includes other parameters, such as: discrimination and guessing, namened as Birnbaum and Fred Lord too

Craig Velozo

Wow, "Once again, the fundamental criticism was that the additivity of psychological attributes had not been displayed and, so, there was no evidence to support the hypothesis that psychophysical methods measured anything." Well, I understand the basic math of the Rasch model, and I believe that it is "necessary and sufficient for measurement". It is so funny that psychological measurement is compared to physical measurement with the suggestion that one is measurement and one is not. So, let's go back to a time when thermometers did not exist... temperature was determined by "self-report"... this is hot and this is cold. So are you saying that temperature could not be measured at that time? How about the concept of "three-dog-night"; the number of dogs that it takes to stay warm on a cold night... is it measurement yet? When did tempurature achieve the status of measurement? Did we have to wait for the thermometer to be developed before temperature became qualified to be considered measurement? People are confusing measurement with precision. The Rasch model works well measuring psychological phenomena. The concept of the probability of passing/failing = person ability - item difficulty with the 50% probability as a good index of measurement that can be replicated for the physical sciences. For example back in the 1960's world record milers were trying to break the 4 minute mile. For those who were at that level of ability, sometimes they could break the record and sometimes they could not (.5 probability of passing). Ask these individuals to run a 3 minute mile... too difficult. Ask them to run a 10 minute mile... too easy. So, the Rasch model (and all other IRT) models would come to the same conclusion. So, if the model works for physical function, shouldn't it work for psychological phenomena? Being tearful every day probably is reflective of someone very depressed. Feeling a "little down" is probably reflective of someone who is very mildly depressed. The Rasch model will clearly demonstrate this pattern. So, it works for physical measurement; it also works for psychological measurement. So you still want to question whether "...psychophysical measures measure anything?"

Nan Kong

Please see " Item Response Theory and Its General Total Score ":

Preprint Item Response Theory and Its General Total Score

Badges
Science topic

Similar topics
Psychometrics
Measurement

More Himal Kandel's questions See All

What are the characteristics of a postdoctoral researcher?

There are several guidelines or advice available for doing a successful PhD. However, there seem to be limited guidelines or advice for a successful postdoctoral research fellowship – what could...

02 March 2019 4,602 4 View

'Multivariate' or 'Multivariable' analysis?

Regarding the correct use of terminology, the way I was taught was: "Multivariate analysis: when multiple outcome variables; Multivariable analysis: when multiple explanatory variables". However,...

05 June 2018 645 7 View

What are the advantages of Rasch analysis over Graded Response Model in development of patient reported outcome measures?

Rasch analysis is the most commonly used IRT model for developing a PRO instrument, or for testing the psychometric properties of a PRO instrument, although it does not take into account 'Item...

06 July 2017 6,117 0 View

How are psychometric properties of a questionnaire measured using classical test theory (CTT)?

By Psychometric properties, I do not mean Validity or Reliability. In RMT, they include DIF, Measurement Precision, Dimensionality etc. Analogous to these properties in CTT ??

09 October 2015 5,100 4 View

What are the long-term impacts of incarceration on youths' developing brain?

I want to explore the long-term effects of incarceration on a youth's developing brain. I also want to explore research that looks critically at incarceration and punitive measures as the primary...

12 August 2024 862 0 View

• What the possible Persistent Organic Pollutants and Heavy metals present in fluorspar, sediments, and water bodies around its mining area?

Approximate concentrations are require in compared with the WHO permissible limts

11 August 2024 2,723 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

Is this a facetotecta nauplius?

This larva was captured using a plankton net in the Persian Gulf during the summer. I believe it may be a Facetotecta nauplius.

08 August 2024 3,746 4 View

May members post flyers about opportunities to present at a conference? If so, where to post?

May members post flyers about opportunities to present at a conferehttps://veraeducation.com/nce? If so, where to post for the Virginia Educational Research Association? https://veraeducation.com/

08 August 2024 4,585 1 View