Do you think it is allowed to delete items from the further analysis if these items result in a lower measurement of test-retest-reliability?

Hakan Stattin Popular answer

I have always been of the opinion that there is nothing magic about a scale or an instrument. Very often I need to trim existing scales down to a minimum (including my own) - at the same time as I want to keep the good psychometric properties. A balance.

At times I am forced to keep a whole scale because I want to compare my sample with what has been reported in the literature concerning clinical cut-offs. This does not happen often, though.

I would not worry deleting an item to keep good test-retest reliability.

Hakan

Hakan Stattin

At times I am forced to keep a whole scale because I want to compare my sample with what has been reported in the literature concerning clinical cut-offs. This does not happen often, though.

I would not worry deleting an item to keep good test-retest reliability.

Hakan

Segundo Gonzalo Pazmay Ramos

I think it depends on the number of dimensions that you are measuring , if they are only two or three , will not affect to remove the item 20, but if they are more than three dimensions, you must analize it carefully .

Benedikt Heuckmann

Thank you all for sharing your thoughts and ideas.

I agree that in order to keep good psychometric properties, changes in published scales should be made with caution. But what do you think if we want to develop a new scale and ensure good test-retest-reliability? Is it appropriate to delete more items that have been shown to decrease the test-retest-coefficient?

For example, we would calculate the test-retest-reliability using all 20 items. The result is a “poor” reliability measure: r=.79. We would decide to delete item14 and the reliability measure increases: r(w/o it14)=.89. With the additional deletion of item5, the reliability measure would increase again: r(w/o it14,it5)=.94 etc. Finally, the scale would consist of less than 20 items and has an almost perfect test-retest-reliability.

Do you think this method is appropriate to develop a new scale? Is there any evidence in the literature?

Hakan Stattin

I would not hesitate to delete these two items. Hard to find hands-on info in the literature about this.

Slawomir Pasikowski

The deleting items is more connected with setting of structure of test and its internal consistency. The time stability (test-retest) is further operation in test validation procedure and seems to be more depends on factors not related (directly at least) to the conception of measured variable (e.g.current event in respondent's individual life). Moreover, after delete some item the assessment of test structure is required again. But despite of this the deleting items is needed when test parameters are not satisfactory. Moreover, deleting "weak" items is standard operation which is encompassed in programs to statistical analysis (e.g.SPSS, Statistica, SAS). In the case of report from analysis of internal consistency (e.g.Cronbach's alfa) one of the columns presents results after deleting each items individually. According to rule, if alfa for whole test is lower than alfa for this test without a concrete item, the item should be removed. As we can see the deleting items is not subnormal. From the other side, during the deleting items operation the results of test structure analysis (e.g.CFA, EFA) should be still checked. When it comes to literature maybe the paper, as an example, would be useful:

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2927808/

Hakan Stattin

Helpful Stephan

Hakan

Benedikt Heuckmann

Thank you, Stephan and Slawomir for your ideas and references!

I am particularly interested if there is a "common" method to delete items based on the test-retest-reliability values. Of course, item deletion by item-total-correlation (i.e. cut-off if item-total-correlation is less than .30) is a common method when assessing internal consistency of a scale. I was asking myself if something similar is used for temporal stability.

Indeed, the German handbook by Moosbrugger/Kelava is a helpful reference. As a preliminary conclusion: it is possible to delete items that cause weak test-retest-reliability values. Simultaneously, you should carefully consider psychometric item parameters (i.e. difficulty, variance, kurtosis/skewness etc.) and the specific item content. Thus, conspicuous test-retest values may serve as an additional indicator for item deletion.

Niloufar Jalali-Moghadam

..might be helpful!

https://www2.le.ac.uk/departments/npb/people/amc/articles-pdfs/optinumb.pdf

http://www.joe.org/joe/2007february/tt2.php

David D Pothier

I agree that scales are not necessarily 'set in stone', but any gains in reliability may well affect validity. This should be checked thoroughly before interpreting the results from the new scale. Almost by definition, adjusting a scale results in an instrument that measures something differently from the validated scale from which it was produced. Swings and roundabouts.

Hakan Stattin

Hi David

Agree (I answered too quickly earlier)

Hakan

Marisol Foronda

Hello Benedikt! Test-retest method is a correlation across time. That is, the reliability is equal to correlation between the scores on the same test obtained at two points in time. It does not measure inter-relatedness of items like what internal consistency method does. So removing an item in your test might affect the constructs that you want to measure.

Linda Lane

Hi, I agree with David and Hakan. Generally I have nothing against trimming a scale when necessary. However, care should be taken - to trim or not to trim will depend on the substantive question you are seeking to answer with your study,

Leap Han, Loo

Agree with Linda's point of view. Data Analysis = Research Questions

David L Morgan

If the item in question is lowering the reliability of the scale, that implies that it is not as strongly related to scale as the other items are -- and this difference must be large for there to be a noticeable effect.

What is the chance that the item in question has a near zero loading on the scale, so that dropping would actually be about the same as including it?

Andreas Wieland

We have just published an article about scale purification, i.e. the process of eliminating items from multi-item scales. We have used the example of SCM, but our framework can be applied to any other discipline. Download: https://doi.org/10.1108/SCM-07-2016-0230 (or request via my ResearchGate page).

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Which Scopus Journal provides the most affordable fees?

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Is this a facetotecta nauplius?