Single-item scales have been used in many studies to measure constructs such as job satisfaction, personality traits, quality of life, self esteem, etc. How does one report on the reliability and validity of these scales?
The reliability of single item scales is both unknown and unknowable. To get evidence on test-retest reliability, you have to administer the same single item scale to the same sample at two points in time and correlate the scores for an estimate. This is test-retest reliability.
The validity of single item scales is assess the same as with other scales, you have to assess the content, construct [convergent and discriminant], nomological, and criterion related validity using the standard methods. See DeVellis, R. F. (1991). Scale Development: Theory and Applications. Newbury Park, CA: Sage Publications. or Netemeyer, R. G., Bearden, W. O., & Sharma, S. (2003). Scaling Procedures: Issues and Applications. Thousand Oaks, CA: Sage.
The reliability of single item scales is both unknown and unknowable. To get evidence on test-retest reliability, you have to administer the same single item scale to the same sample at two points in time and correlate the scores for an estimate. This is test-retest reliability.
The validity of single item scales is assess the same as with other scales, you have to assess the content, construct [convergent and discriminant], nomological, and criterion related validity using the standard methods. See DeVellis, R. F. (1991). Scale Development: Theory and Applications. Newbury Park, CA: Sage Publications. or Netemeyer, R. G., Bearden, W. O., & Sharma, S. (2003). Scaling Procedures: Issues and Applications. Thousand Oaks, CA: Sage.
Wanous, J. P., Reichers, A. E., & Hudy, M. J. (1997). Overall job satisfaction: how good are single-item measures? J Applied Psychology, 82(2), 247-252.
I agree with Ronald Goldsmith that reliability cannot be assessed for single item measures. However, validity can be established by predicting the pattern of correlations with other theoretically relevant variables. (i.e, construct validity)
I Also agree with Ronald and David. Reliability can not assess for single item scales. There should be minimum three items to assess the accurate Reliability.
I agree with the comments of previous colleagues. I add: Asking 0ne question may have practical value, e.g., How satisfied are you with the advice of your physician?. However, one question cannot measure the validity or reliability of a "construct". And, one question is not a "scale". While efficiency and burden are important, validity should be the primary focus of measurement.
Thank you for your responses. The Wanous et al. (1997) article is a very good resource (thank you, Paul E. Spector!). From what I have learned it seems that when the single item (SI) is "unambiguous" and "concrete" or “doubly concrete”( Bergkvist, 2014; Bergkvist & Rossiter, 2007; Diamantopoulos et al., 2014; Rossiter, 2002; Wanous & Hudy, 1997), a 1-item questionnaire can be as effective as a multi-item (MI) scale. Momentum seems to be growing for the use of such scales; and time & cost benefits, as well as reduction of respondent fatigue, etc. seem to be some of the motivational factors in using the SI vs. the MI…
Question to Richard Windsor: if the responses range from, say...1 - 5, is that a scale, regardless of how many items/questions are included? I might be wrong, but I was thinking of a scale in terms of a range of responses...
Ref.
Bergkvist, L. (2014). Appropriate use of single-item measures is here to stay. Mark Lett, 26(3), 245–255. doi:10.1007/s11002-014-9325-y
Bergkvist, L., & Rossiter, J. R. (2007). The predictive validity of multiple item versus single-item technology of the same constructs. Journal of Marketing Research, 44, 175-184.
Diamantopoulos, A., Sarstedt, M., Fuchs, C., Wilczynski, P., & Kaiser, S. (2012). Guidelines for choosing between multi-item and single-item scales for construct measurement: a predictive validity perspective. Journal of the Academy of Marketing Science, 40(3), 434–449. doi:10.1007/s11747-011-0300-3
Rossiter, J. R. (2002). The C-OAR-SE procedure for scale development in marketing. International Journal of Research in Marketing, 19(4), 305–335. doi:10.1016/s0167-8116(02)00097-6
Wanous, J. P., Reichers, A. E., & Hudy, M. J. (1997). Overall job satisfaction: How good are single-item measures? Journal of Applied Psychology, 82, 247-252.
That is quite a thorough literature review, but I would disagree with the conclusion that "Momentum seems to be growing for the use of [one-itme] scales" since there is a very strong limitation involved, which is the lower reliability of single items versus multi-item scales.
In particular, reliability is usually considered to be more basic than validity because the extent to which a measure contains reliable content (versus random error) sets the upper limit for the correlation of that item with an source of validation. In other words, as item with low reliability will be restricted in its ability to correlate with anything.
With only a single item, you have no idea how reliable that item is, so you are inherently limited in that regard.
Also, with regard to terminology: Likert scoring is the appropriate label for response formats such Strongly Agree to Strongly Disagree, where as the word scale should be reserved for measures that involve multiple items.
I think David expresses the issue very well. Reliability sets the upper bound for correlations. A correlation coefficient cannot be larger than the square root of the smaller reliability of the two items being correlated. Review the literature on "correction for attenuation" for a more detailed explanation, but in general, multiple items are better than single items unless you good evidence for the high reliability of the single item.
If you say SCALE, that means you have done confirmatory factor analysis (CFA) after exploratory factor analysis. According to Byrne (2009) and Hair et al., (2010), you must have at least 3 items per construct/dimension to run CFA. If you have got good fit values even after running single-item scale there are few usual ways to test reliability and validity. First, reliability of the scale is usually tested by Coefficient alpha. But if you are using structural equation modeling (SEM), Coefficient H is the best measure of reliability because many a times Coefficient Alpha may be under-reported. Test re-test reliability is a popular approach as well. But to perform that you need two sets of data (longitudinal). This can be expensive. Thus, you may think of split-half method. I always go with Coefficient H. Second, to test validity, we usually go with convergent, discriminant and nomological validates.
Just saw this older post. Nice suggestions Ronald and the rest! Unfortunately, in many fields, such as where I work, single-item measures and dichotomous items (also solitary) are practically the norm.
There are ways to test the validity of measures, when you have two or more to compare, not mentioned by others.
You can use IRT (item response theory) analyses to examine how much information, per item, is provided on the latent trait by each scale/item.
AIC analyses can also compare items/scales/models on their ability to provide information on a common outcome variable (hopefully continuous).
As we are all searching for the 'best' measure of the constructs we study, more comparison studies would be helpful.
I think single item questions are expected to be answered differently. That is why we get significance differences among participants. Rather the meaning full issues will be the test-retest validity that will help to construct standardized tools and also when we translate one language to an other language.