I would like to discuss the following idea about the assessment of test-retest-reliability:
Imagine that we conducted a study (n=100) with 20 items (it1, it2, …, it20) and the questionnaire was applied two times (e.g. 3 weeks after the initial application). We calculated the test-retest-reliability and it was r=.79. Now imagine that we found, the test-retest-reliability significantly increases if we delete item it14 from the scale, for example, r(w/o it14)=.89.
Are we allowed to delete it14 from the further analysis because it decreases the test-retest-reliability? Is it a valid way to develop reliable scales? Do we violate any assumption of reliability? Do you have any evidence from the literature that confirms the idea?
Thanks for sharing your opinions.