In few literature, Likert scales are considered as interval scales while in few others, they are considered as ordinal scale. What is more reliable source of knowledge?
Likert-scale data considered as ordinal data as the period between scales could not be considered equal. Some studies used the method of Okpala et al. (1993) which assumed a fixed intervals between scales and therefore, consider the data as continuous data which can further be analysed using parametric tests. On the other hand, Jamieson (2004) and Boone et al. (2012) highlighted that the Likert-data are categorical data and can not be considered as an interval data. This is also, supported by many scholars such as Kuzon et al. (1996), that reported the consideration of Likert data as an interval and the use of parametric test to analyse it as the first deadly sin that researchers should be avoided.
Resources:
Jamieson, S., 2004. Likert scales: how to (ab) use them. Medical education, 38 (12), 1217–1218.
Boone, H.N. and Boone, D.A., 2012. Analyzing likert data. Journal of extension, 50 (2), 1–5.
Kuzon Jr, W.M., Urbanchek, M.G., and McCabe, S., 1996. The seven deadly sins of statistical analysis. Annals of plastic surgery, 37 (3), 265–272.
Ahmed's citations summarize the usual concerns over the scale strength of Likert-type scales. I believe that S.S. Stevens (1946) was one of the first to bring attention to scale of measurement: On the theory of scales of measurement, Science, 103, 677-680.
In Rensis Likert's 1932 monograph, A technique for the measurement of attitudes (Archives of Psychology, No. 140, pp. 5-55), there are several important points that he made:
1. For an individual item, Likert showed that the correlation of his arbitrary (1-5) scale values correlated extremely well with what a more involved scaling procedure--meant to yield interval strength scale values--yielded.
2. For measurement of overall opinion, the intent was to sum the scores across items, not to use individual item results as the value of interest. Disputes notwithstanding, many folks consider the summated scale scores to behave sufficiently well to treat them as if interval (rather than ordinal) strength.
If you wanted to avoid the controversy altogether, you could use Rasch or IRT (item response theory) models for polytomous items, which often appear to work well with rating scale items (and do help tie responses to an interval-strength scale). Here's a link to a (now) free text that can help get you started: https://rasch.org/BTD_RSA/pdf%20[publisher]/Rating%20Scale%20Analysis.pdf
There is a difference between individual Likert-scored items (which are ordinal), and scales that are created by summing the items (which are often treated as interval). As David Morse notes, "many folks consider the summated scale scores to behave sufficiently well to treat them as if interval (rather than ordinal) strength." So, if other researchers in your field routinely use scales that are created from multiple Likert-scored items, then you should not have any problem following that standard.
Because there have been so many questions here about Likert-scored items and scales, I have compiled a set of resources on this topic:
It is true that Likert scales are not technically-speaking interval-level scales like the weighing machine.
Nevertheless, if you use summated rated scales that have a few items that measure a construct, these scales, provided they have high reliability, approaches or comes close to the interval scales.
Summated rating scales are used widely in regression analysis but the purist would insist that only interval level measures be used for regression analysis.
It’s best for you to consult your supervisor and committee members on how they view the psychometric properties of Likert scales, if you are submitting your research to them.
Likert Scale should never be considered as interval scale! You see, Likert Scale has a label for every score and the data is no longer continuous, and thus not meeting the assumption for parametric statistical analysis. There are novice as well as ignorance researchers out there who treat ordinal data as interval data and analyse using the parametric analysis. Its like rubbish in rubbish out kind of analysis.
In many fields, there is a long tradition of adding together highly correlated ordinal items to create continuous scales. But you need to consider whether that tradition is accepted within your own field, because, as you can see, some people are quite hostile to what is well received by others.
There are both school of thoughts on this topic !!
With proper consideration of research question and framing of questionnaire, one can also use parametric tests which by any means are more robust than non-parametric tests for analysis provided sample size is large enough to make distribution normal as per central limit theorem.
Regarding the raw Likert items, one should notice, that:
1. Subtraction is not defined for ordinal items unless a very strong (and possibly unrealistic) assumption is made, that all levels are equally spanned.
Let's assume you have items: very good=5, good=4, average=3, bad=2, very bad=1
To make a subtraction (which will be used by any parametric method and some non-parametric ones, like the Wilcoxon test for paired data), you will have to assume that there is a common unit:
very good - good = average - bad = bad - very bad = 1[unit]
Can you do that? Can you justify it convincingly? Will your audience, including domain experts and statistical reviewers agree? If all answers are "yes", then OK.
2. Arithmetic mean (and type-7 median) will likely produce outcomes outside the original levels.
What does it mean to have the average = 3.5? What is the meaning of it? Is 3.5 = "average and a half"? Or maybe "almost good" ? What does it mean to have 4.73? "Practically very good"? How are you going to interpret two means: 3.12 and 3.67? Almost average and about good?
Of course, it's just numbers, so technically you can do anything you want. But you are then responsible for:
a) making assumptions that have to be convincingly justified; if you take wrong assumptions, your entire analysis will be wrong
b) providing explanation for fractional outcomes, that will be acceptable to your readers. You could agree, for example, on threshold =0.5, so 3.51 is good, while 3.5 is average. And 2.51 is average, while 2.5 is bad. But does it sound sensible?
If so - please go ahead.
But if you feel you cannot make so strong assumptions, better stick with more relevant, acclaimed methods like the multinomial logistic regression or ordinal logistic regression (proportional odds model) - If only the assumption on proportionality holds. Otherwise use partial proportional odds model or dichotomize the output and use separate logistic regression.
All those methods are more difficult to perform and interpret (a matter of training) but are formally correct.
Simple methods work well except the cases when they don't.
The t-test (let's forget about the problem with subtraction) may tell you that in average your responses are generally lower or higher in certain group, but that's all what it can do for you.
Regarding the Likert scale, made from individual Likert items, seems to be another story - and here there are different traditions, as other said in this thread. It makes no sense for me to repeat what was already said.
In addition to the mentioned (good!) literature, please find also this web-book: https://bookdown.org/Rmadillo/likert/
published also on the Research Gate:
Deleted research itemThe research item mentioned here has been deleted