As researchers do not agree with the appropriate number of point Likert scale (5, 7...?), I would like to know the main contributions or papers to support both options.
A higher point Likert scale makes it more time consuming for the person answering the questions to take decision. % point scales a quick to discriminate between the different options. Even numbered scales donot have centering. Odd numbered scales have a centering.
It is hard to assess and conceptualize the difference between large scales for participants. If you are not measuring objective results and conducting a social study, thus seeking for individual subjective responses, i recommend you to use 5 points.
This all depends on the psychometric properties of the measure that you are using. There is much debate over the use of a 5-point scale vs. a 7-point scale (note that a Likert scale is technically 5-points, from strongly agree to strongly disagree...any modification to that is a Likert-type scale). That said, the biggest reason why you would want to go with 7-points (or 9) would be to increase the variance in your measure. However, caution should be taken to avoid distortion due to extreme score bias (many respondents are not inclined to respond to high and low points).
All that said, you should use the scaling that the authors of the original measure report, as that would be the most valid (and potential reliable) method, since that is what they used to test the psychometric properties of the measure itself.
Look at John Krosnick's work on this. I don't have the citation handy but he has at least one paper on it, and a relatively recent chapter with Stanley Presser on related questionnaire design questions.
What matters is that the scale is validated. For example service quality is typically measured using 7 point scales, in technology studies perceived ease of use and perceived usefulness are typically measured using 5 point scales, countless thousands of repeated measures have endorsed the validity of the scales. A researcher cannot really have free choice to have preference IF they are to undertake research with validated scales. The free choice option (if researcher designs) means that the scale is not (yet) validated?
In my institution of HE researchers cannot undertake surveys unless they obtain ethical approval from the institutional ethics committee. If the research involves a survey the first question from the ethical approval committee is "Are the scales to be used validated?"
I endorse Matt Jans suggestion about Jon Krosnik's work (it is Jon). He is the only person I have found who has actually researched this topic. (Though if you are going to use scales I also suggest you read Scwarz and Hippler's work on the influence of value range on responses.)
A very crude summary of Krosnik is that you should use 5 oint scales for unipolar items and 7 point scales for bipolar items. Further, you should use item specific scales rather than agree-disagree. Have a look at some of the work referenced here
I don't think you will find vastly different results using 5 or 7 (or 6 or 8) response points. I base that conclusion on my own experiences but also on the lit. review in this chapter co-authored by Krosnik:
I also reviewed the 2014 Revilla, Saris & Krosnick paper referenced by @Ziller above. I was skeptical about the results--after all there are many contradictory findings (including Alwin & Krosnick, 1991). On one hand, the results shown in Revilla et al are strong and consistent and decline from 5- to 7- to 11-points. On the other hand, the 7- and 11-point conditions were tacked on the end of a 200+ item survey.
So my guess is that Revilla et all actually found that when you ask people 200 questions and then start repeating your questions, respondents respond inattentively, which would be consistent with research on inattentive survey responding (e.g., Meade & Craig, 2012).
Also, if you examine their Table 4, almost all the difference is in "validity" (you can ignore "quality" because q=r*v) and this "validity" is like Cattell's "direct concept validity" but not like any kind of validity recognized by the joint guidelines for psychological tests. Table 4 shows that: (a) the reliability of 5- and 7-point scales were almost identical (0.717 vs. 0.716); (b) 11-point scales were only slightly less reliable (0.709 vs 0.717/0.716); and (c) all three response scales had adequate reliability. So, I don't accept that there are big differences between different scales.
Finally, I'd like to bring up a completely different issue, which is the expected skew of the responses. If respondents are likely to use all five responses of a 5-point scale, then five points is probably plenty. However, on many kinds of instrument, you can expect a degree of skew in the distribution of responses across the anchors. I once analyzed a colleague's cultural gender ideology scale with questions like "Women's only important role is to be homemakers." and in western samples almost all the sample used one or two of a 5-point response scale (e.g., almost everyone disagreed with the question above). The same thing happens when you measure most facets of job satisfaction (most people are fairly satisfied, except with pay) or supervisor's perceptions of performance (there is a clear ceiling effect). On such scales, it would have been handy to have administered that scale with anchors that are also asymmetrically skewed to try to create more variation in highly skewed responses. It's conceivable (i.e., this is a conjecture) that using more response points would help in this situation.
Depends on your target respondents, if your respondents have wide knowledge on questionnaire items you can go with 7 scale. wider the scale you expect respondents have more knowledge as even in some cases scale of 13 are used.. But to get more responses/accuracy generally 5 scale is used.... how ever game of perception...
There is a lot of useful information in this thread, particularly Alan's very insightful analysis of the research and his challenge about how to deal with issues where most respondents will go to one extreme of a scale. However, there are a number of implicit issues that probably need to be teased out: the number of points (still not resolved), the difference between psychometric scales and sociological use; what kinds of analysis are appropriate.
Much of the discussion appears to assume, and in one instance state, that people weigh a range of options and choose the one that best matches their experience, preferences etc. Cognitive research suggests this is not what happens at all. Ostrom argues that in fact our cognition is largely categorical, we assess phenomena, including ourselves as belonging in a category or not. Schwarz and Hippler showed how that worked with scales; people categorise themselves as being in certain groups in relation to the question (I watch more TV than most; I support equality for women; I am a Republican; I support freedom of expression; etc.) and then use the range provided in the scale to find a position on the scale that matches their self-perception. In other words people generally use the information provided by the anchor points of the scale as information about the range of social behaviours and then find a point that fits their self-perceived categorisation.
For this reason, I really don't see much point in going beyond a 7 point scale for any question.
Nor should the data in sociological uses be treated as scalar (despite Krosnik's arguments that anchoring works to provide scale).
I also think one has to be careful about framing questions to avoid the sort of 'skew' to which Alan refers, unless of course one want to identify how many people do NOT want to present as adhering to the social norms. The more one's questions reflect the range of social responses, the more 'accurate' the answers will be. (This of course poses a chicken and egg question for researcher.)
An important inference from the research is that most self-reported data is about self-presentation and not some underlying 'true' state. In so far as we analyse the data it should be seen as evidence about self-presentation (to oneself as much as to others) not necessarily as evidence of behavioural or attitudinal preferences (the women's role question is a good example of why this is an issue).
Finally, the paragraph above also points to the value of scales in psychometric use. In psychometric usage an individual's responses are analysed as a whole (or at least in groups of questions around a concept) to come to a conclusion about the individual's mental state or preferences. Similarities and differences in responses to related items are core data for the analysis.
The sociological use of scales is more problematic because it rarely does anything of the kind. (The use of factor analysis, or principal component analysis are rare in sociological uses - though moderately common in market research.) Often there is only one question (occasionally two) on each separate concept; and the data from each scale is usually treated as if it is, by itself, a more or less accurate representation of some personal attribute. Few researchers analyse the pattern of an individual's responses or use that analysis to assess how to interpret and assess the response (some use weightings when 'bias' is observed but even then the assumption is largely that each item is separately valid data with a 'bias' that can be 'corrected'). And, despite our knowledge that individuals have different preferences in responding to questions (affirmation bias; tendency to the middle, tendency to extremes), it is also rare for researchers to use the pattern of an individual's responses to weight the answers given.
It will depend on what you exactly want to measure. On issues of service delivery, quality of output, a 7 scale can suffice. However, when you come to consolidation( combining), they all lead to the same as lower scales. But you must be sure to avoid repetitions and mixing answers
Although there is a debate about what Likert scale to use in the questionnaires, I personally prefer 5 point Likert Scale comparing to 7 or 10 point Likert Scale.
According to Dawes (2008), with a Five - point scale, it is quite simple for the interviewer to read out the complete list of scale descriptors, and it is also quite simple to analyze the research data.
Reference: Dawes, J. (2008). Do data characteristics change according to the number of scale points used? An experiment using 5-point, 7-point and 10-point scales. International journal of market research, 50(1), 61-104.
There is large debate on the 5 point and 7 point scale. According to Olakunke (2003) 5 point scale is better as it provides a better way to communicate with the respondents.
In my opinion 7 point scale is better as it provide more wide option. The drawback of 5-point scale is that respondent many a time prefer to tick middle (i.e.3) which is neutral option (neither dis-satisfied nor satisfied) whereas in 7-point scale due to increase in option, likelyhood of such option gets reduced comparatively. You can use 1-10 scale which is also being used by many surveys!
The main factor that can decide whether to go with 5- or 7- point scale depends on the knowledge and clear understanding of the extant of sensitive difference between agreement levels among the neighboring points on the scale. In my opinion, in order to avoid confusion and get reliable answer, 5- point scale is better than 7- point scale.
Zaphaniah Isa Its lovely to see a post from someone at Jos, I was born there and visited in March this year. I hope things are going well for you.
But turning to the content of this thread, I sometimes feel as if some researchers are more interested in simplifying their analysis than getting the most reliable results. I often read or hear researchers talking about forcing people to answer one way or another (commonly when arguing for even numbered scales or for not giving "Don't know" options). I question the ethical basis for such a position. Further, such an approach seems to me to be totally against the idea of objective and/or rigorous research. Physicists who try to force experiments to give particular answers would never get published.
More generally, I suggest you read the posts in this thread and the accompanying references from Malek Sghaier and the anonymous post which has the most views. Despite Alan Mead's argument I think you do get different results depending on the number of points in the scale. I take his point about respondent fatigue affecting the Revilla, Saris and Krosnik work, but by the same token, respondent fatigue does appear to increase with increases in the number of points above 7. Having a mid-point is also important. By 2012, and despite the 2010 Chapter he co-wrote with Stanley Presser, Krosnik (personal communication) certainly thought 5 points for a uni-polar scale and 7 points for a bi-polar scale were preferred and gave the most reliable results (though it says nothing about the validity of the answers).
The other issue with Likert scales is understanding what they tell you. Schwartz and Hippler's work shows that what you get from Likert scales is a self-assessment of where the respondent sits in relation to other people ( "I watch more TV than most", "I am more satisfied than most" etc.) This is an example of 'satisficing', the process which involves people giving a 'satisfactory' answer rather than consciously working out the answers to the questions. The finding is consistent with the work of cognitive scientists which suggests that people's responses to questions are largely an expression of one dominant answer that emerges after the brain has generated multiple possible answers automatically by implicit cognitive processing (Kahneman calls it the "shotgun"). Sometimes we get two or three "dominant answers" and spend some time deciding between them. But conscious consideration of possible answers is usually limited to what seems fair enough. More significantly, respondents are not aware of the implicit cognitive processes.
So, what does the cognitive science mean for using Likert scales? First, of course, you need to interpret them as indicating self-perception of social position rather than as a direct reflection of the concept asked in the question. Second, five points in a unipolar scale and seven points in a bipolar scale probably give people enough range to feel comfortable locating themselves socially and may explain why one gets more reliable data.
You need to remember that what a numerical scale does is to produce a measure on, usually, an interval scale from an essentially qualitative underlying reality, or at best, in subject matters that strictly permit it because of their nature, an underlying rank order reality.
It is true that many factors can be taken into account when deciding how many points of comparison to use (3,5,7 etc), and these allow some estimates to be made of responding characteristics or tendencies of participants in the research, such as extreme avoidance or extreme preference, but the behaviours being observed are response selection behaviours, not differences in the underlying theorised construct. Much of the differences in number of scaling points can be attributed to factors other than validity factors, such as reliability, ease of enumeration of results or calculation of patterns.
When you are turning feelings/preferences/ways of thinking etc into numbers you are simply creating an artefact, a useful artefact, but still only a proxy for what you really want to measure. Never forget that is what you are doing and always remember where all this psychometric stuff came from - from the desire to ape the natural sciences, to produce 'instruments', scales, enumerable outcomes, of a communicative interaction between researcher and subjects or subjects and subjects (whether written, spoken, or based on observation of communication among subjects).
Don't get me wrong. The ability to create sets of numbers allows powerful manipulations and analyses of data which can throw light on underlying human processes of meaning making, action/interaction creation etc. But there is an intervening process of interaction/creation etc in the process of ticking the boxes.
This was brought home powerfully to me when I sought to validate a scale developed in North America with research subjects in Papua New Guinea. I ran a 5 point Likert scale and after each question was answered I asked each subject why they had chosen the answer they had chosen. Content analysis wasn't really necessary, but I did it anyway. The groups I tried the scale with were from about 15 different cultural groups and it showed, in their interpretation of the meanings of questions, of what 'agreeing' and 'disagreeing ' meant, and what the difference between a 1 and a 5 meant, and so on. Salience too was an issue. Some 'topics' were known to respondents only by rumour not by acquaintance. Had I provided a 'we don't think that way in our village' option instead of 'don't know', it would have been used a lot. Some could answer 'hypothetically'. They had heard of this 'issue' and if they were living in a big city they would probably have answered the way they think they would if that were true..but they weren't sure what it was all about, really.
I thought these phenomena might be confined to a naive, multicultural group of respondents, but when genuinely 'open-ended' follow up probes were used in the same way in validating a scale for use in a 'monocultural' study in the metropolis, quite a lot of the same stuff appeared - depending on subject matter there were differences in meaning, salience, significance of responding style etc among people with differing educational levels, age, rural/urban backgrounds, religion, religiosity etc. But those variables were ones I wished to use later to explore differences among subjects and here they were determining the reliability and validity of the scale I was intending to use to collect my data.
Don't talk about creating numbers from the fluidity of meanings and interpretations as if you were comparing lengths against a standard metre rod kept at constant temperature in Paris. Remember what is actually happening, inconvenient though it is.
Lissitz, R.W., Green, S.B., (1975), Effect of the number of scale points on reliability: a Monte Carlo approach, Journal of Applied Psychology, 60, 10-13.
7-point or 9-point scales seem better according to Maydeu-Olivares et al. (2017):
"We recommend using a large number of response alternatives (≥ 5) to increase the power to detect incorrect substantive models."
--
Alberto Maydeu-Olivares, Amanda J. Fairchild & Alexander G. Hall (2017) Goodness of Fit in Item Factor Analysis: Effect of the Number of Response Alternatives, Structural Equation Modeling: A Multidisciplinary Journal, 24:4, 495-505, DOI: 10.1080/10705511.2017.1289816
Debaraj Das I am not quite sure about the intent of your question. If it is about which is better, have a look at some of the other posts in this thread for a detailed discussion of the pros and cons.
In short, Krosnik and Sara's research suggests that 7 points is better for a bi-polar scale (i.e. Disatisfied to Satisfied) and 5 points is better for a uni-polar scale.
If the question is whether one can change an existing scale from 7 to 5 points, the answer is more complicated. Yes you can: but if the scale has been validated as a 7 point scale, changing to 5 points will invalidate the scale.
David Glyn Roberts, you draw what is probably an important distinction between contexts in which 7-point versus 5-point response options are preferable. Could you provide a reference for Krosnik and Sara, please?
Apart from that, I wonder whether moving from a 7-point to a 5-point range of options automatically invalidates a scale, as you seem to indicate in your post. Might it be the case that there is merely a risk (not a certainty) of invalidating the scale?
I ask mainly because I think a lot of scales possess goofy attributes that possibly invalidate them, and researchers do things with scales that are a lot more goofy than altering the nature of the response options - so in terms of the validity of scales, altering the number of response options is pretty low on the list of priorities.
As indicated in some of my earlier posts on this question, I think "goofy" is a good description of many of the uses of Likert scales. I have serious reservations about the way Likert scales are used in sociological or opinion research. Not least that:
most such scales should be used as a battery of items around one concept
the cognitive processes underlying a response to a scale are very different to those assumed by most researchers (see Schwarz and Hippler, Kahneman and others), sorry don't have references with me at the moment, and as a result Likert scales present self-judgements about where the respondent sees herself compared to others. It is NOT good data about how a person might actually behave in other contexts
With regard to the citations I don't have them with me at the moment, and one of them was personal communication from Krosnik. You can however, find them through the other answers to this question or pprg.stanford.edu/.
David Glyn Roberts, thanks for getting back - and I'm glad we can agree about the goofiness of some research(ers). When it doesn't frustrate me, at least it amuses me.
Thanks, also, for the Stanford site. The reference you referred to might be the following:
Revilla, M. A., Saris, W. E., & Krosnick, J. A. (2014). Choosing the number of categories in agree-disagree scales. Sociological Methods and Research, 43, 73-97.
Yes that is one of them. Please note Alan Mead's comments about that research as well. I disagree with him about the implications but he makes some good points.
Thanks to David Glyn Roberts and Hoshiar Mal for your comments. The discussions are interesting. I am mused by the word "goofy" and probably agree to Robert Trevethan. Interestingly, I find the difficulties the respondents face in differentiating. In the following scale, some (or many) may struggle with the choice of 2 vs 3 and of 5 vs 6.
1 – Never true
2 – Rarely true
3 – Sometimes but infrequently true
4 – Neutral
5 – Sometimes true
6 – Usually true
7 – Always true
To me, a 5-Point scale could be a better differentiator. Of course, I agree that converting a validated scale from 7 to 5-point scale may require re-validation.
By the way, I did a pilot study after converting the 7-Point Likert scales to 5-Point. All scale reliabilities look good.
Debaraj Das, I'm glad my use of "goofy" amused you. I really couldn't think of a better word to describe the kinds of things I see too frequently in publications.
Here is an interesting point that I think is worth pondering: What does the option of "neutral" at #4 above mean when respondents choose it? Might it mean a number of things, including undecided, don't know, or don't understand? I confess that I'm troubled by that kind of option because it could mean a variety of things but is happily fed into analyses.
Incidentally, I think that options 2 and 3 might be a bit difficult to distinguish from each other, but 5 and 6 seem quite different to my eye. Finding the best words for response options is often difficult - but worth trying to get right.