I've heard arguments that a Likert-type scale is ordinal data. I've heard arguments that this type of data is interval data. Some believe it is quasi-ordinal-interval data. Which exactly is it?
@ Martin and Reginald :
As long as people doing 'harmless' research, following the best practices in their discipline and sticking to the agreed methodological rules, manuscripts will continue to be accepted through the obligatory but highly controversional process of peer review (see related Q&A on RG). Of course, what counts as "best practice" or "agreed methodological rules" may change from time to time.
That doesn't mean, however, that such research, i.e. their methods and results, are automatically free of any flaws and errors, especially as seen from the most advanced and sophisticated point of view of measurement and statistics.
And there is much to to wonder and worry about, because critical voices and helpful warnings about some of the more important issues are not of recent date, but already continue for several decades:
just to name a few. For each, you will find ample literature, pro and con.
And here's the real pragmatic problem: the experts don't always agree among themselves how to proceed in the best or only way. How on earth could researcher X who is not a measurement or statistics expert then know, what to do? Above all: he or she is probably not really interested in those technical details, because he or she is focussed on a substantive problem in his field of research.
Martin asked, what we can do about it, and posed two subquestions regarding measurement properties and statistical bias. I think, the answers have already been given in the relevant literature, either in the field of representational measurement theory (RMP), e.g. Krantz et al. or Narens, or in the statistical literature, especially the subfield which is concerned with the robustness of statistics and statistical tests and the use of parametric versus non-parametric statistics.Again, the problem is one of a gap between those who know and those who use.
Conclusion: the problem is one of a gap between those who know but don't care about sharing their knowledge in a user-friendly way and those who use but don't care about the relevance of the (mind) tools they use.
Since it makes NO sense to do arithmetics with the scores of a Likert scale, it is just an Ordinal scale...
I know that some guys calculate averages with this scores, but this is wrong.
It *is* ordinal. Anything else is whishful thinking in order to justify the application of standard analysis methods.
Actually, a great many people create averages from LIkert scales, and this is extremely useful. Of course, from a purist standpoint, you can't take averages on an ordinal scale, but from a pragmatic point of view, it not only works but is also accompanied by standards for how well it works.
In particular, calculating a Cronbach's alpha on sets of items scored on LIkert-scales will give you an estimate of the reliability of the average from the set of items (i.e., the percentage of the variance that is due to measuring the same underlying concept, as opposed to random error).
The key point here is to use multiple items, where any one of them may be too weak to provide an adequate measure, but the combination of them is much stronger.
This is an accepted practice in almost all social science fields, and if your goal is to publish in those fields, then I suggest you follow the standards your peer reviewers will expect of you.
In fact, the sum or average of many items can, under special assumptions, be seen as "quasi-interval". But this requires that the combination of the items is meaningful, i.e. when the items all measure the same or at least a very similar variable. When this is the case, everything makes sense. I am afraid that this is often not the case and still averages are calculated and interpreted.
Ordinal variables tell us not only that things have occurred, but the order in which they have occurred, but they tell us nothing about the differences between the values.
Interval variables concerns data measured on a scale along the whole of which intervals are equal. For instance people's ratings of a product can range from 1 to 5; for these data to interval, it should be true that an increase in appreciation of the product is represented by a change from 4 to 5 along the scale should be the same as the change in appreciation represented by a change from 2 to 3, or 3 to 4.
In fact, Likert scale refers to ascribing quantitative value to qualitative data, to make it amenable to statistical analysis. It is useful to use Likert scale as interval data. We can create averages and apply some statistical tests when using Likert scale for a questionnaire statements.
The topic is controversial, and the previous answers reflect this fact. I read in Wikipedia about Likert scales and all answers of contributors are there reflected up to some extent. I think that we can agree that the decision of considering only order, add a distance or even accept the sum as a valid operation is essentially subjective. At most, we can achieve a consensus within a well defined group of individuals.
Obviously, this does not solve the question stated by Reginald. In its place, I would like to go a step further. I am asking myself whether the subjectiveness is essential in any probabilistic model, even in those cases in which nobody doubts about the appropriate scale. The problem seems to be centred in the design of a sample space for any random variable. Probability theory requires just a sigma-field to define probability. However, a richer structure is implicitly or explicitly assumed in most cases. One of the most elementary cases is the assumption of an order. Afterwards, scale can be introduced to get a disimilarity, a divergence or a distance. At the top of the complexity, there are the Euclidean or Hilbert structures which admit addition of random variables, distances, thus allowing expectation and variance, ...
I would say that all these assumptions are mainly subjective in any random variable, not only in the Likert case.
In the case of Likert scales, it seems that there is consensus in accepting ordering. In fact, it is implicit in the questionnaires. Distance and expectation (average, sums, variances) are rejected by some authors. However, I have the feeling that in some studies using Likert scales there is a contradiction: only order is admitted but afterwards averaging is used, for instance, when looking for a consensus.
This is only a recall that for practicing statistics, the analysis of the properties of the sample space can be crucial for reliable and interpretable results.
Thank you all for your contributions they were interesting.
Likert scales are an ordered continuum of response categories, and balanced number of positive and negative options. Numeric values are assigned to each category for the purpose of analysis, and you can create averages from Likert scales. It is very useful in psychometric questionnaire and quality of life studies.
Others have offered great insight already, but one thing I'll note is that if you check your Likert scale data structure (e.g. for skewness, kurtosis, and other issues) up front, then the results from many inferential analyses will often be consistent regardless of whether you treat the data as ordinal or interval. For example, the results from an ANOVA that treats Likert scale data as interval should be consistent with those from a Kruskal-Wallis test that treats the scale data as ordinal.
In my experience, it's uncommon to get drastically inconsistent results. When this happens, usually it's because the data is badly skewed or has some other non-normal distribution. In such a case, many of the non-parametric statistics applicable to ordinal data are often preferable anyways.
One final thought - different fields have different conventions with Likert scales. Go with what's most accepted in your particular field.
As others have noted, technically, Likert Scale items are ordinal scales.
However, when you create a Likert scale by summing or averaging these items then your scale "approaches" Interval scale properties.
Many years ago I had an evil Statistics professor who made all of us PhD Qualifier students run many many simulations testing underlying assumptions of various statistical techniques. We found that summated rating scales, such as Likert scales, provided pretty much exactly the same results when using the technically incorrect parametric statistics as they did using the technically correct, but messier, non-parametric statistics in terms of significance levels, and parameter estimates in measures of association and difference. The only exception was with very highly skewed data.
Conclusion: for all practical purposes you can treat Likert scales as if they are Interval scales. i.e. it's okay to find the mean and standard deviation, and run t-tests and use them as predictors in a regression, etc.
This is where Rasch analysis is very useful - it allows you to create ordinal-to-interval conversion tables. But it can't be done in a rush.
It is by nature ordinal, however, is also being used as interval for practical purposes. The major problem is lying with interpretation, how one can design to use mean and variance as for as estimation is concerned. For testing, one can go for non-parametric tests.
Imagine a Likert-type scale constructed on a 7-point or 5-point scale along the dimension "very outgoing" to "very isolationist" or any such continuum. Is "somewhat outgoing" one less than "very outgoing"? If so, one less of what? And what about just "outgoing"? If "somewhat outgoing" is one less than "very outgoing", is "outgoing" more or less than "somewhat outgoing"? How much more or less? What if no corresponding terms are used for the scale (as is often done) other than "Very X" and "Very Y" with just numbers corresponding to responses in-between? On a 7-point scale from "Very religious" to "Very antireligious" with 1 being "Very religious" and 7 being "Very antireligious" but no terms corresponding to values 2,3,4,5, or 6, what exactly does 4 represent? Is it really one more than 3 and one less than 5?
The point is that, as useful as Likert-type scales can be, to treat them as interval (or ordinal, actually) data without accounting for the fact that
1) different subjects will treat the same responses differently
&
2) there is no precise "unit" in any such scale such that any response is numerically any "unit" more or less than another
yields problems. Likert-type scales are ordinal data in that there is order and in that (unlike interval or other data types) distances between response categories are not quantitative/numerical. Unfortunately, numerous multivariate methods supposedly for "ordinal data" are not particularly robust to distance variation among subject responses.
There are numerous approaches, from fuzzy set theory & item response theory to configural frequency analysis & multidimensional nonlinear descriptive analysis that are designed for Likert-type responses (among others) or at least designed for related questions and have been adapted for Likert-type responses. Various distance/dissimilarity measures (Minkowski, Mahalanobois, Hausdorff, etc.) can be readily adopted (and adapted for specific datasets such that machine learning algorithms yield optimal classification of responses) for ordinal data. Order statistics, multidimensional scaling, even spatial data analysis offer a plethora of techniques for Likert-type data. None are ideal, because variance for any particular set of responses on any particular Likert-type scale can make one options inferior or superior to another. The key is to be familiar with a range of strategies not only for "significance" tests and the like but also to better understand the nature of one's data so as to choose the optimal measures/tests/methods. All of the most frequently used statistical tests assume that variables under consideration are continuous. This is almost never so. Only by knowing the nature of your data can you really know whether or not the violation of this assumption matters for tests x, y, or z.
Because using ordinal scales of e.g. quality is commonplace in educational and instructional practice, an interesting or rather confusing problem arises, if you happen to have multiple scales which have to be combined in order to yield a single score, e.g. a percentage, which in turn can be used as the basis for grading.
This is a very common practice in all kinds of educational settings. However, because many teachers and faculty members "forget" that they deal with ordinal scales, all kinds of forbidden operations are performed on such scales (or rather, the scores on such scales), presumably while the operations are so simple, e.g. averaging.
In order to solve this conflict between simplicity and meaningfulness I have developed a rather new approach, which allows the correct manipulation of ordinal scales of any order (= the number of 'points' on the scale) and in any wanted hierarchical composition. Here, it's not the appropriate place to delve into the technicalities of my approach (based on a combination of Probability and Fuzzy Logic, see the link if you are really interested), but let me summarize a long exposition with claiming, that I have found a handful of basic Assessment Patterns with which you can model practically any assessment situation, including such that are based on Likert scales.
http://pass-companion.weebly.com/
Solving a pseudo-problem : numerals are not numbers !
Most problems people have or create when treating Likert-scale data as interval data can be avoided if they recognize the difference between NUMERALS and NUMBERS.
The labels 1, 2, 3, ... on a Likert-type scale are NUMERALS, not NUMBERS. As soon as one understands and acknowledges this simple fact, the pseudo-problem dissipates for one's eyes.
How to avoid this confusion? I never use the symbols 1, 2, 3 etc. for a Likert-type scale or any ordinal scale. I consistently use non-numeric symbols or labels, e.g. smileys, or plus-ses, or whatever. But never the numerals 1, 2, 3, .. which could be misunderstood as being natural numbers, and thus implying equal distance between the levels on this scale. The American grade scale of A, B, C, D, E also avoids this pseudo-problem, because ... what is the average of B and E? It's not defined at all!
But then you may wonder and ask whether it is completely impossible to aggregate several ordinal measures into a single "kind of average"? Don't panic: it's not impossible. Statisticians use for such cases the well-defined median, which is a point on the scale (e.g. the letter C on the American grade scale) that best splits the set of observed scores into two more or less equal parts of almost 50% scores on the left side and the rest on the right side. That works fine except if you have very few scores, in which case you may feel the median is too coarse.
I have found and used another approach which works fine for all cases. It is based on the comparison of score histograms, i.e. the distribution of scores over the n points of an ordinal scale. Using the score frequencies, or rather probabilities, it turned out to be possible to impose an order on the histograms, and hence by way of a nice formula to calculate a percentage which nicely represents all the scores. The important point about it: it only uses ordinal information of your data (here: scores), so it's completely independent from the kind of labels that you use for your points on the scale. You may read more about it on my website.
Paul, or anyone in the network,
How would you code the string data for SPSS or SAS in order to use it in multivariate statistics tests or for data reduction such as factor analysis? Would you use decimals in equal intervals or recode A, B, C, D, E into Arabic numerals? It seems to me the scale would need to be recoded in order to make it amenable to modern statistical software programs and the more advance statistics methods.
It's all well and good (ideal, in fact) to realize that Likerty-type data are "not numbers". The problem is how one can statistically analyze sets of responses that are not numbers with statistical tests, models, or analyses which necessarily require numbers. It's hard to plug "Strongly Agree" into something as simple as Pearson's r, let alone some multidimensional scaling analysis.
One can analyze frequency distributions. (Ordered) logit models. Fisher, Chi², Kolmogorov-Smirnow, ... Things like that. Asking for "means" just because the "mean" is the only statistic known to someone is bad choice.
Jay, Hume and others have noted above that using Likert data as "quasi-interval" is often a practical way to go, leading to very similar conclusions as the "messier" correct methods, IF the distributions of values are unimodal symmetric. I generally agree to this. However, I doubt that uniomodal symmetry is a typical distribution. Especially when the questions are good they *should* well separate the answers (persons should tend to either "fully agree" or "fully disagree", but not be rather undecided about most of the questions. But exactly then, when the questions are well-desiged, unimodality and symmetry should be least expected.
It is desired to reduce the complexity of the world to a minimum of relevant aspects. I think that this is often over-done, i.e., the analysis over-simplifies the reality and therefore leaves relevant important aspects unconsidered. This should not be desirable. Somtimes it is simply not possible to reduce the complexity below a certain limit without irgnoring relevant features. As long as one critically thinks which things are relevant and then choses the most simple analysis considering all things considered relevant, everything is ok. This "most simple analysis" may turn out to be more complex than just comparing "average scores".
@ Andrew Messing : Fully agree! Several answers, depending upon the context you are working in and the goals of your research.
If you are mainly interested in population characteristics, i.e. actual distributions, correlations, regressions and things like that, you may have to delve into the respected subfield of statistics called ordinal statictics (see e.g. links).
Unfortunately, ordinal statistics is not taught in most applied disciplines, as it is considered an advanced expert topic being too difficult for students. Thus most students grow up with a heavily biased picture of what statistics can do for them and will often apply statistical models which are not apt for the data they are confronted with.
However that is only one side of the coin. If you moving ahead from doing descriptive statistics (the easy thing) towards inductive or confirmatory statistics (what you, Andrew, may be aiming at), then you can't afford and succeed without at least a rudimentary theory (mostly in form of hypotheses) about your population and the phenomena you are interested in. Statistics is no more than a supporting discipline for both empirical and theoretical disciplines (or, in the words of Jerome Cornfield: not a "queen", rather a "bedfellow", see file, p. 7-8), what you need or choose from it is dictated by your data AND your theory.
There is also a third answer in the context of applications where you are solely or mainly interested in individual cases, lacking a clear and consistent definition of the statistical concept of a population. That's the context I am working in. I don't have a well-defined population, or rather it is changing all the time. I don't have the resources (time, money, people, etc.) to steadily adapt eventual tools e.g. tests based on some statistical or psychometric technique(s). But I have to come up at the end of my courses with some fixed statement (call it subjective judgement, if you like) about the quality of the work delivered by my students. That subjective judgement is usually based on a lot of evidences or indicators of a purely ordinal character. Ordinal statistics doesn't help me here at all. For that reason, and some other inherent problems I can't go into here, I had to develop a completely novel approach using techniques from discrete mathematics and fuzzy logic (see link). For me, it works fine, notably because I know what I am doing, i.e. I know my tool, its potential and its limits, and I can play with it until I find a solutuion for my problem which suits me and my goal.
If that's still not enough or convincing, you may have a closer look at the fascinating approach called Bayesian statistics which succesfully combines 'hard' statistics with 'soft' data and is in particular also applicable in cases where a Frequentist Approach would fail - because you don't have and can't get access to a sample of observations from your domain, for the simple reason that there is only a single case (degenerate population). If you want to have a quick and easy introduction to the world of Bayesian thinking, please read "The Probability of God" of Stephen Unwin, it's really amusing and instructive. Or, if you have more time and inclination, read about the 150 year history of Bayes' Theorem in the book by McGrayne. See links.
http://pegasus.cc.ucf.edu/~lni/sta6238/McCullagh1980.pdf
http://cran.gis-lab.info/web/packages/ordinal/ordinal.pdf
http://www.amazon.com/The-Probability-God-Calculation-Ultimate/dp/1400054788
http://www.amazon.com/The-Theory-That-Would-Not-ebook/dp/B0050QB3EQ
www.passorfail.de.vu
All the answers have been very helpful! Thanks to everyone who has contributed.
It is ordinal scale because it takes discrete values such as (1,2,3,4,5), while the interval data values take a continuous values within a certain interval.
Most people here seem to agree that *formally* treating Likert scales as interval level is not allowed. How about *pragmatically*? Treating Likert scales interval is so much easier to do for the author and much better to comprehend for most of the audience (reviewers!).
To my mind it also depends on the context, more specifically the potential loss when making wrong decisions. For example, I would favor a formally sound treatment when subsequent decisions affect people's lives or have societal impact. If the aim is, let's say, exploring a theory, using the standard repertoire (factor analysis, linear regression etc) might be good enough.
After all, statistical analysis always is a compromise between pragmatic value and fidelity. Can someone point to resources of how harmful it actually is to treat Likert scales interval?
@ Martin : I doubt whether your question "... how harmful it actually is to treat Likert scales interval?" can be answered without fixing the context, as you yourself suggested when you wrote "To my mind it also depends on the context, more specifically the potential loss when making wrong decisions."
I'll give you a simple example of the potentially dramatic consequences of not considering the distinction between an ordinal and an interval scale, and on top of that using a wrong statistic to base your decision upon. It will turn out that depending on an arbitrary but fully admissible choice of ordinal scale values, you can decide whatever you like, e.g. if there are two alternatives to choose from, you may come up either with a tie (no-decison), or with a preference for the one or the other alternative (reversion of decison). Thus there would be ample room for manipulation in such a situation. You would certainly call that 'harmful' don''t you!
So let us assume the following context. You have to test two persons, let's call them A and B, and your task is to decide on the basis of the test results which of the two your company should hire for a particular job.
Let's further assume, that you have got a very simple test with just 6 items to be scored on a simple 3-point scale which is assumed to "measure" the qualification of candidates A and B. The three points on the scale which is intended to be ordinal are labelled by the test designers as LOW, MEDIUM and HIGH (or any other symbols e.g. smileys).
After administering the tests to A and B you find out that the resp. response patterns are:
for A : LOW:2, MEDIUM:2, HIGH:2 (total: 6)
for B: LOW:1, MEDIUM:4, HIGH:1 (total: 6)
Thus on face value you would probably conclude that B is a little bit better than A, because both respond symmetrically, but B has more answers on MEDIUM than A.
Now I'll show you how things (=decisions) change dramatically if you want to come up with a numeric test score using arithmetic means with three different - but fully admissible, while ordinal - valuations for the three points, and the simple rule: the higher the test score, the better the qualification.
CASE 1: LOW=-1, MEDIUM=0, HIGH=+1
In this case, both A and B get a test score of 0, so we have a tie, we can't decide between the A and B and have to gather further evidences
CASE 2: LOW=-2, MEDIUM=0, HIGH=+8
In this case, A gets a test score of 2 and B a score of 1, so A will be your favorite.
CASE 3: LOW=-8, MEDIUM=0, HIGH=+2
In this case, A gets a test score of -2 and B a score of -1, so B will be your favorite.
This simple counter-example of the colloquial wisdom "It doesn't do any harm to treat ordinal scale as interval, and use the ordinary arithmetic mean of scale vales as a test score" may seem extremely concocted, but actually it demonstrates dramatically what's basically wrong and misleading with such an attitude.
In the contrary, if the test would have been designed as an interval scale test, this could not have happened, because the three CASES do NOT correspond to one and the same interval scale, .i.e. you can't get them from each other by a fixed linear transformation ax+b.
P.S.: If you don't like to work with negative scale values, it is easy enough to redress this example with all-positive scale values, however it won't change the argument at all.
P.S.: Similar counter-examples could be worked out for other contexts, e.g. to find out whether a candidate passes or fails a pre-set criterium.
Dear Martin and Paul,
I appreciate your answering the question further. Very interesting additions to the question and discussion.
I have used the Likert-type scale data as dependent variable in multivariate statistics, such as Factor Analysis, and it seems acceptable to editors and reviewers to use this scale in this way. Nevertheless, Likert-type data seems to be quasi-interval data based on responses to my question so far.
@ Reginald : What exactly do you mean with "... used the Likert-type scale data as dependent variable ..."? If data should act as a dependent variable, you probably want to explain or reconstruct or predict them from some set of independent variables given a functional relationship (postulated or estimated from all data). If you have reason to believe that your Likert-type scale data are an accurate representation of an ordinal structure, the only restriction you have to put on the functional relationship is, that it is a monotone transformation of that Lickert-data. If a single monotone transformation applies for all measured objects or subjects, I would say: that's OK. If you need different transformations of the scale for different objects or subjects, I think there is in fact reason to be suspicious about your model: there may be other independent variable(s) that you should include in your model to explain all variance. Does that make sense to you?
Paul,
the example you constructed is exactly what I would call a high stake situation, where formally strict treatment is indicated. So, we seem to be on the same page. Except, maybe, that in such a situation I would never rely on just three items.
However, we academic hardliners may rant about such practices, but that won't stop pragmatically oriented people from doing it. Or have you ever heard of a case where someone sued a human resource specialist because of mistreating test data?
So, what can we do about it? Constructing examples that show potential misuse or bias is one thing. But, do we have
(1) procedure to test relevant measurement properties
(2) quantification of bias that occurs in typical high stake situations (preferably expressing it as losses)
--Martin
Paul,
Factor analysis in this case is used as a data reduction technique; a principal component analysis with an un-rotated solution and scree plot. With an Eigenvalue of 1 criterion for selection the scree will tell you how many of the original variables account for how much of the scale variance. For example, an instrument with 26 original items might be reduced to 10 items loading on to two or three factors. With this knowledge you can run additional factor analysis methods (promax or varimax rotations) to derive the final factors. The derived factors can be used as dependent variables. This type of analysis is broadly accepted and repeated thousands of times in the social science literature--in many of the top journals.
Reginald,
my hope is that these publications are all of low stake. There's a huge gap between state-of-the-art psychometrics and what majorities do.
--Martin
@ Martin and Reginald :
As long as people doing 'harmless' research, following the best practices in their discipline and sticking to the agreed methodological rules, manuscripts will continue to be accepted through the obligatory but highly controversional process of peer review (see related Q&A on RG). Of course, what counts as "best practice" or "agreed methodological rules" may change from time to time.
That doesn't mean, however, that such research, i.e. their methods and results, are automatically free of any flaws and errors, especially as seen from the most advanced and sophisticated point of view of measurement and statistics.
And there is much to to wonder and worry about, because critical voices and helpful warnings about some of the more important issues are not of recent date, but already continue for several decades:
just to name a few. For each, you will find ample literature, pro and con.
And here's the real pragmatic problem: the experts don't always agree among themselves how to proceed in the best or only way. How on earth could researcher X who is not a measurement or statistics expert then know, what to do? Above all: he or she is probably not really interested in those technical details, because he or she is focussed on a substantive problem in his field of research.
Martin asked, what we can do about it, and posed two subquestions regarding measurement properties and statistical bias. I think, the answers have already been given in the relevant literature, either in the field of representational measurement theory (RMP), e.g. Krantz et al. or Narens, or in the statistical literature, especially the subfield which is concerned with the robustness of statistics and statistical tests and the use of parametric versus non-parametric statistics.Again, the problem is one of a gap between those who know and those who use.
Conclusion: the problem is one of a gap between those who know but don't care about sharing their knowledge in a user-friendly way and those who use but don't care about the relevance of the (mind) tools they use.
Paul, this is an important sentence you made: "Above all: [...] he or she is focussed on a substantive problem in his field of research."
This aim is not achieved nor is it targeted by the mindless application of hypothesis tests (or even worst but more commonly used and logically unsound "null hypothesis significance tests"). Instead, substantive problems will only be tackeled by statistical thinking. The term statistical may be stubstituted by scientifical.
Paul,
as you brought up the issue of ignorance towards Bayesian statistics, allow me one critical remark: why are you proposing a single statistic (median) to represent the parameter instead of the full posterior distribution?
While I agree with you that NHST is strictly voodoo*, the issue here is not the general philosophy of statistical inference, but what model complexity is good enough for the purpose. In the context of your own research that would translate to: how strong does the measurement model using plain confirmatory factor analysis deviate from a more advanced model (let's say a partial credits model, PCM)? And is this deviation acceptable or not?
Let's not forget that a PCM has many more free parameters which requires larger samples. Furthermore, you surely want to check a few other desirable properties, for example whether differential item functioning introduces some test unfairness. That's more or less an interaction effect, multiplying the number of free parameters in the model. So, you may end up in the unpleasant situation where you simply have to decide what you desire more, a 100% accurate measurement model, or testing other psychometric properties.
--Martin
* voodoo: ineffective, incomprehensible and still practiced
Quick answer (before leaving for dinner): I am (mainly) not working in a research context, but in an educational context, where single scores (decisions) are expected, not distributions. But you are right: when I start with a prior distribution (which I do), why not ending up with a posterior? Never thought about that. Nice we have RG to ever bring up new insights ... thanks.
Usefulness or utility of models is one of several parameters or criteria to put at work. Fully agree. Again: what is your context, pure research (then you want to be as exact as possible), or practical applications where you can live with proxies. E.g.. very often linear models are a good approcimation though you know that linearity is not the "real thing". Two more or less famous examples:
Gelukgewenst met Holland 3de plaats !!!
Short replies on some earlier comments, in case anybody is still waiting on a clear and concise response/opinion from my side:
Paul, since I brought up the issue of high versus low stake (impact), allow me the following remark.
You say: "What is the difference between high-stakes and low stakes testing, [...] doing purely descriptive statistics given a fixed, agreed upon numerical scale which represents a community's conventional reporting standard (low stakes), or aiming at a full-blown inductive approach where you want to test scientific hypothesis and doing more than comparative analysis but want to have exact measurements (high stakes)."
Sorry, I can't agree on that. Scientific theories (at least in the social science) rarely affect people's lives directly. But, assessing people in school, psychiatry, forensics, human resource management asf. impacts their lives. Accordingly, these are the situations where I would demand high fidelity modelling and Bayesian statistics (as it most easily links to loss functions).
Furthermore, what you say about factor analysis is true for exploratory factor analysis, whereas confirmatory factor analysis is theory-driven and one of the most accepted techniques for psychometric assessment of Likert-type scales. Btw. what drives off many people of using CFA is the large sample size required when using classic statistics. This is somewhat mitigated by recent Bayesian methods, see [1].
[1] Lee, S., & Song, X. (2004). Evaluation of the Bayesian and Maximum Likelihood Approaches in Analyzing Structural Equation Models with Small Sample Sizes. Multivariate Behavioral Research, 39(4), 653–686. doi:10.1207/s15327906mbr3904_4
Martin: Thanks for your comments, especially your point concerning the important distinction between exploratory FA and confirmatory FA.
Indeed FA may be used in both contexts. Still IMHO Factor Analysis as such is just a nifty mathematical tool exploiting some useful theorems and techniques from linear algebra, which (techniques) as such have nothing to do with the substantive theory that a researcher likes to investigate. I guess that's what you mean when you - correctly in my view - talk about theory-driven. But that's not different from using other mathematical or statistical techniques to produce confirmations of hypotheses or predictions regarding one's substantive empirical theory. Nobody would call such techniques "confirmatory" or "theory-driven". Why does FA need that ...?
Regarding your second (actually: first) objection, i.e. that I put the terms "low stakes" and "high stakes" at the wrong positions, you may be right in view of the conventional usage of the term "high stakes", e.g. in the context of "high-stakes standards-based testing" which is currently under fierce debate in the USA (what one may call the 'Common Core Clash').
However, look what we get if we reverse the associations. Now we have :
In other words: given high stakes, people might feel OK to invoke and use a minimal sort of statistical rigor to decide about life and death of other people? Perhaps you are right, and in fact I have seen several examples of such an attitude myself. My only comment is: too bad for such a society.
I sincerely hope I have misunderstood resp. misinterpreted your intention. So can you please give your definition of a high stakes adequate approach to data analysis?
As a parent, past classroom teacher, educational consultant, and researcher in educational psychology, I am disheartened to see the disconnect between gathering data for practical educational purposes and the act of measurement in research, as I felt was expressed in the quote above "I am (mainly) not working in a research context, but in an educational context, where single scores (decisions) are expected, not distributions." Educational contexts ARE research contexts, and why is there an an equation being made between "single scores and (decisions) with "distributions" placed in opposition? Uhmmmmm....scientifically based evidenced practices are based in looking at all evidence (hopefully triangulated- as a single point tells one very little). Which brings us back to agreement scales being used inappropriately.
Agreement is a feeling, the different feelings have been demonstrated to be "easier" to use - Agreement is most used, then Disagree or Strongly Agree, then Strongly Disagree. Therefore the intervals with an individual is not equal (as it would look like this SD..............D......A.......SA). There is also a known issue of between people differences in feeling, some only using very intense feelings, others never expressing intensity, others perpetual optimists and others pessimists. Therefore, in trying to compare feelings linearly within and between individuals, you only have an ORDERED categorical scale.
There is another problem most do not consider. Social Scientists (including eductors) tend to use agreement scales to measure something else about people. Items need to be written to represent the known number of specific levels of what is being measured. Often constructs are only conceptualized dualistically (high/low) and the scale used to differentiate is an agreement scale- written to respond to items written from either the high or the low view. If the person Agrees, they are considered HI and Disagrees Low- but disagreeing to something does NOT MEAN you are agreeing to its opposite. For instance- if the item is "Do you love me" and I mark SD, does that mean I hate you? If the item was "Do you hate me" and I mark SD, does that mean I love you? NO- because we cannot assume answering no to one thing means yes to its supposed opposite.
People create surveys with agreement scales because in a purely mathematical sense, they can get numbers which can be manipulated to create averages and used in statistical tests. In almost ALL Social Science research, the results of these types of surveys will show significant correlations related to two conditions and differences when comparing groups or times survey is taken. What is ALSO the case is that there is poor reliability and poor factor structure not demonstrating the theory is also found. People do not seem to get that means the survey did not function as a measure of anything and any correlations/differences are therefore worthless.
There is no quick way to assess humans. Surveys are only as good as the items written and the theory upon which it was modeled. AND a REAL scale is a ruler with equal intervals- so LIKERT-scales are NOT real scales.....sorry. Its used because we like an easy efficient way and have gotten used to doing things wrong. Which is ALSO why we (as educational social scientists) cannot seem to provide evidence for our "truths". The only scaling method that DOES work is a 1-P IRT method called Rasch-but it requires a lot of understanding of the construct to write items that can represent a construct on specific levels. Check out Wolfe and Smith, 2007a. It's an article that details how to create a working scale that will.......well, work!
PS: If you care about kids in schools and high stakes testing, know what testing really does and which tests are good/bad and WHY...if you can't understand testing you cannot fight to make it more developmentally appropriate. Testing isn't bad, its the way its done and used that is not working.
Just a couple of other thoughts, for what they are worth. First, contrary to Paul, I believe there is a reasonable but probably unstated and untested assumption in test design: When people see a scale with response options of 1...2...3...4... etc., they intuitively interpret this as a continuous, therefore interval measure. Second, it is easily demonstrated that for quasi- or sort-of interval measures such as are on Likert scales, there is very little difference. If one can assume that the true underlying construct being measured is continuous (there are of course exceptions, as pointed out by Trish above), the issue then becomes, If the true difference between each point on the scale is not equal to 1, how much of a problem stems from its being larger or smaller than 1? It is easy to put together an Excel doc (or ask me for mine), that assumes a true continuous distribution and an ordinal "guess" at true scores. With the "guess" restricted to a random value between a tiny increase over the previous score to 10 points, for each 1-point increment in "true" scores, it is remarkable that the two scales almost always correlate at r= .95 or above. Any social scientist would be very happy to have a scale this close to valid -- in other words, the error introduced by treating an ordinal scale (with some caveats) as an interval scale is trivial compared to most measurement error.
Hi,
I was just wondering , the Likert scale (eg. 1 to 5) , if we don't label it (eg. Strongly Disagree - Strongly Agree), and just put the numbers for them to circle, would the respondents be able to answer the questions ? And yes how
If yes, then I would say that it is fit for measuring as it brings meaning to the respondents / reader.
If no, then could that mean it is just a mere label for the rank order (SD - SA) from less to more ? And said, can we average the numbers ( 1 to 5) assigned to the ordered labels (SD - SA) ?
thanks.
Hi, Nor. The endpoints, at least, would have to be labeled so the respondent knows which direction is which, but otherwise, I would say Yes to your question, although clearly others in this thread would disagree. Incidentally, there are situations in which the scale might need to be lop-sided or unbalanced -- for example, if you were testing a group of known high-IQ students, and asking them, "How smart are you?", there is no point in starting the scale at "Not smart at all, and the top few points might need to be labeled "very smart," "extremely smart," "genius," "super genius," and "smartest person in the world." I don't know how much bias this might introduce into my assumption that people intuitively treat a numeric scale as if it were inches on a ruler.
Thank you all for putting new wind into this question. Your comments are very intriguing.
For "high IQ students," I would also ask them "How dumb or ignorant are you?" on the scale with an equal amount of items to the "How smart are you" items on the instrument. The construct obviously being measured would be modesty.
Taking averages of data collected using L scale is wasteful and wouldn't make any logical sense. I would treat it as an ordinal data.
Hi
I have a case of analyzing the impact of one variable on the other one. I used Likert scale in questionnaire. One variable is explained by several questions and the mean of the answers is counted as a variable amount. The question is whether I can use pearson or spearmen correlation and linear regression on my analyses?
thank you
The question is if means of questions scores make sense in the first place. If they do, and if the explanatory variable is at least interval scaled, then a linear regression will make sense.
@Allan Lundy: sorry, but you are confusing things here. The problem with Likert scales is not that they are discrete. The problem is that there is no guarantee for 5-4 = 4-3 = 3-2 etc. In fact, there is ample evidence in psychology's branch psychophysics that the mind is non-linear. Ever wondered why the decibel scale is logarithmic?
It is cetainly ordinal and one direction wherein the magnitude of differences are judgement based. Only by a large sample these hold good
The moment you are looking for the likhert scale itself is with the strong assumption that quantifying them with perfection in any form is an impossibility
Although It is an ordinal scale. it is upto the researchers that how he consider this scale. He may use it as either ordinal or interval..
Hy,
It is a good discussion.
In my opinion, the Likert scale is an ordinal scale, because with this scale we can't measure the distance of two levels of scale. In fact, we can measure the numerical distance of 4 from 5. But, this difference don't have sense in the qualitative argument of the respondents.
By other side, the interval scale allow to measure this diffence in the scale (number) and in the respondet opinion.
Hair et al. (2009) discuss this in their book.
Congratulations for this discussion.
Regards.
José Augusto
This link provides interesting discussion, and good references within:
http://jalt.org/test/PDF/Brown34.pdf
Or this one, starting at about page 7:
http://users.sussex.ac.uk/~grahamh/RM1web/Levels%20of%20Measurement.pdf
Anecdotally, when I've obtained Likert or Likert-type data in my own research, I've noticed that the majority of the time, the results of a Mann-Whitney U test and a t-test or ANOVA lead to the same conclusions regarding the data. I'd be interested to see a paper describing the necessary patterns of data for the "conclusion" to be discrepant.
@Arif Hassan: Of course, any researcher is free to call apples 'oranges' - or the other way around - but that doesn't change the real fact that 'apples' taste and smell different from 'oranges' for anyone whose taste and smell perception is working intact. Unfortunately, some people suffer from some taste or smell disorder (cf. e.g. taste disorder, link below), but I don't think that you were referring to these unhappy fellows).
But there is much more to this story of scales. First, we should always distinguish between data acquisition and variable measurement. Data acquisition can be done in a number of ways, some bad, some good, as far as reliability and precision are concerned. As such, they don't tell us anything about an underlying scale, if there is one indeed. At this level, I would prefer not to talk about measurement or scales at all. You're just collecting data, with or without some (ingenious) instrument.
You are moving up in the hierarchy of scientific research only as soon as you start to treat your data as being belonging to a network of (quantitative) variables which belong to some sort of theory about things in the world. It is natural and necessary to be explicit about the sort of operations and interpretations that you may make with those variables. We call this measurement of the first order. [if you like, you may call data acquisition measurement of zeroth order, but that is a bit of cheating]
In real life, however, where results and products of scientific research are used by other people than the scientist or researcher himself, and for other purposes than just improved understanding of our world, we should also care about the communicative and social aspects of measurement. Variables and the scales on which they are mapped should be standardised in such a way that different people will attach the same meaning and notation to the results of measurement. This requires precise scale definitions and calibrations, and a lot of negotiations. This sort of measurement procedures I like to call measurement of the second order. It is a highly technical and social endeavour. Think of the way that our 'meter' is defined, or that we are now measuring time, or the use of Rasch measurement models in education (e.g. by PISA).
In view of these highly important cultural and technical aspects of measurement in the public domain, the trivial blunders which are made again and again by individual researchers, who for instance confuse the indicators 1, 2, 3 ... on a 'Likert scale' with numbers whereas they are just symbols used to define ordered categories, are almost disappearing in importance.
If you really think that a phenomenon for which you collect data (not yet measures!) with the help of a 'Likert scale' might in reality be associated with a variable of interval type, then you will have to start a non-trivial research project showing that indeed you can mold and transform your data in that direction, and how anybody could do that with the help of well-defined equipment, procedures, etc.
https://en.wikipedia.org/wiki/Taste#Disorders_of_taste
@ Hendra Kartika : Thanks a lot. Can you please give us:
Thanks in advance!
In a previous contribution, I suggested, that "Data acquisition can be done in a number of ways, some bad, some good, as far as reliability and precision are concerned. As such, they don't tell us anything about an underlying scale, if there is one indeed. At this level, I would prefer not to talk about measurement or scales at all. You're just collecting data, with or without some (ingenious) instrument."
I forgot to mention, that there are indeed ingenious ways to get more information out of your data, even if you don't know or don't have to know anything special about the scale type of your data. Statistics and measurement theory are fine, but there is much more on earth than that. I'll give you two famous references where to look, if need arises:
Personally, I don't consider all of these fields as completely separate approaches. Rather they complement each other wonderfully.
--------------------------------------- [ added 2016-05-28 ] ---------------------------------------
The distinction between EDA (Exploratory Data Analysis) and CDA (Confirmatory Data Analysis: what you' doing when you test hypotheses and models) is not new.
It is just a rather technical instance of Reichenbach's 1938 distinction between the Context of Discovery (CoD) and the Context of Justification (CoJ) (cf. link), which you'll find back in many places in social science, e.g. the qualitative / quantitative debate (which many take for a pseudo-discussion, cf. link), or the abduction / induction distinction going back to Peirce ~1900 (cf. Reichertz 2009, Reichertz 2013, see links).
Grounded Theory (cf. Charmaz 2014) , favoured by many social researchers who prefer a rather linguistic-narrative or hermeneutic-phenomenological approach to researching (e.g. ATLAS.ti, cf. link) is another instance of CoD.
So you see: history repeats itself, there's really not so much new under the sun, except relabelling of terms and new wine in old barrels and emperor's new clothes.
https://en.wikipedia.org/wiki/Exploratory_data_analysis
https://www.youtube.com/watch?v=g9Y4SxgfGCg
https://www.mpiwg-berlin.mpg.de/en/research/projects/deptii_aufrechtmonica_history
http://atlasti.com/
http://www.amazon.com/dp/0857029142
http://www.amazon.com/dp/3531176773
http://wilderdom.com/research/QualitativeVersusQuantitativeResearch.html
http://www.qualitative-research.net/index.php/fqs/article/view/1412/2902
http://users.uoa.gr/~psillos/PapersI/11-Peirce-Abduction.pdf
i have been going round in circles trying to understand whether my Likert Scale is ordinal or interval, on my notes my lecturer has advised me the questionnaire is interval data and that the questionnaire will produce ordinal, so slightly confusing! So it's best to just say ordinal data?
Thanks in advance.
I would also interpret them als ordinal, because we only assess categories between two extremes (e.g. do not agee at all --- do totally agree). But, from a psychchological point of view, if we rate categories that are numbered, we get the impression to have an intervall-like scale. To some degree I sometimes feel it could also be like a ration scale, because "totally do not agree" is some kind of absolute zero, isn't it? At least when it comes to self-assessment. Nobody can not agree less than zero, can he/she?
The problem is even worse when you treat the verbal label "totally do not agree" as an absolute minimum (equal or similar to an absolute zero). For consistency reasons, you should then treat "do totally agree" as an absolute maximum. In that case, the scale is a closed interval, i.e. something like the unit interval [0,1] consisting of all numbers between 0 and 1, inclusive. Of course, you may take any other numbers, e.g. [a,b] with a < b. BUT: all standard scales from an interval scale upward assume by definition that there is no upper limit, otherwise the very properties of such a scale can't be realized. Thus we have a contradiction.
This is not to say, that you can't have scales which are strictly bounded from below and from above, but such scales are non-standard scales, which you won't find often in practice. For instance, it is relatively easy to define a (bounded) interval scale on the unit interval, but then you have to re-define addition and multiplication in order to make that happen. In particular, you need fuzzy addition and fuzzy multiplication, which behave quite differently from ordinary (=arithmetic) addition and multiplication. Do you want that? [I need it in my Fuzzy Logic based educational assessment system, but that is quite another discussion]
I may have missed a post or two, but it's work noting the relationship between Likert scales and rank statistics.
A likert scale is ordinal by construction. It is always unsafe to use measures that rely on a consistent meaning of intervals if the scale is ordinal.
However, that is not the end of the story. We can still rank ordinal responses. Rank is an interval scale (intervals on rank have the same meaning everywhere in a given ranking). Mean rank (for example for different treatments) also has a meaning on the ranking scale, and can be used, for example, as estimates of a population mean rank. As the number of cases increases it becomes increasingly defensible to use means, standard errors, sums of squares etc on ranks, and indeed that is what things like the Friedman test do for large scale lengths or numbers of treatments.
The obvious connection with Likert scales is that there is no difference between a ranking scale of, say, 1-7, and a 7-point Likert scale re-coded 1-7. So I would argue that as long as Likert scale points are treated _and interpreted_ as rankings, it is reasonable to apply all the methods that can apply to discrete, finite interval scales. I would also guess that this is what many practitioners are, consciously or otherwise, doing when they speak of averaging Likert responses and feeling that this has a degree of validity. If we are indeed speaking of ranks, it does.
What remains unsafe is to assume that we can go back from the rank scale to interpret intermediate points on the original Likert scale in any consistent way. For example, we can not assume that a 2-point difference or spread at the bottom of a Likert scale 'means' the same as a two-point spread at the top. Nor can we assume that because a mean rank comes out exactly halfway between the ranks for 'agree' and 'disagree' that the mean implies exact neutrality. Worse, we cannot infer that the average for the ranks of 'very dissatisfied' and 'very satisfied' 'means' the same as the average rank for 'satisfied' and 'dissatisfied' even though the mean rank is identical.
As ever, though, it's a poor statistician who asks about the first moment without also asking about at least the second, and ideally about the distribution around it.
I present a table summarizing the operational different scales of measurement.
This is what I teach my students at the University of Mascara Algeria.
(Soon, I would present you another table on the use of various statistical data for categorical variables.)
Scale measurement
Features Nominal Ordinal Interval Ratio
Indication of Difference Status Yes Yes Yes Yes
Direction No Yes Yes Yes
Amount No No Yes Yes
Properties Order No Yes Yes Yes
Distance No Yes Yes Yes
Equidistance No No Yes Yes
Zero rationnel No Arbitrary Arbitrary Yes
Univariate statistical processing
Frequencies Yes Yes Yes Yes
Median No Yes Yes Yes
Sum/Substraction No No Yes Yes
Average, variance No No Yes Yes
Ratio No No No Yes
But,
Can you compute mean or variance for this statistical variable (Likert)?
The problem remain methodological (or use) and not the qualification !
Likert is ORDINAL.
The statistical analysis depends on the level of measurement of our variable. Since fully-anchored Likert-type scale is an ordinal variable, computing the mean and variance yields meaningless results. Suppose we coded 1 for Strongly Disagree, 2 for Disagree, 3 for Neutral, 4 for Agree and 5 for Strongly Agree. And the mean of the following responses: 5,3,2,1,1 is 2.4. We are unable to interpret 2.4 for WHAT, as only 1,2,3,4,5 have been assigned a meaning/definition.
If the design of the scaling is such a way that we instruct respondents to rate 1 for Strongly Disagree and 5 for Strongly Agree, then we can yield interval data ranging from 1 to 5. It has been argued that such Endpoint Likert-type scale is an interval data. So, mean and variance shall apply to this Endpoint Likert-type scale, as in other interval data.
Remain on the BASIC!
and read:
Property: INTERVALS (or distance, gap).
The property of intervals relates to the relationship of distances between objects. If a scale has the interval property, the unit of measure means the same thing throughout the number scale. "That is, an inch is an inch is an inch, no matter if it is before or after a mile on the road".
More precisely, an equal difference between two numbers reflects an equal difference in the "real world" between the measured objects (assigned numbers).
Let M: O --------------N= M(O)
O are statistical objects
N=M(O) indices, numbers
M function or relation between O and N.
In order to clarify the property of the intervals algebraically, four objects are required: Oi, Oj, Ok, and Ol.
The difference between the objects is represented by the sign "-";
Oi - Oj refers to the real difference between objects i and j, while M (Oi) -M (Oj) refers to differences between the (measures) numbers.
The property of intervals:
For any i, j, k, l, if Oi-Oj≥Ok-Ol M (Oi) -M (Oj) ≥M (Ok) -M (Ol).
A corollary of the previous definition is that if the measure (or number) attributed to two pairs of objects are different, then the pairs of objects must be equally different in the real world.
The interval property is satisfied if for all i, j, k, l,
If M (Oi) -M (Oj) = M (Ok) -M (Ol) Oi-Oj = Ok-Ol.
On the other hand, we can verify that if two pairs of objects have different distances (scale deviation), then we must assume that the objects are also different in the real world.
In the case where the interval property is not satisfied, any statistical data that would be produced by adding or subtracting a number (or unit) would not make any sense and would be an error
(re)Read
Colleagues,
Here is the reference to Stevens' seminal article of scales of measurement:
"On the Theory of Scales of Measurement." By S. S. Stevens. Science, New Series, Vol. 103, No. 2684. (Jun. 7, 1946), pp. 677-680.
Here is the link to the article on Google:
http://marces.org/EDMS623/Stevens%20SS%20(1946)%20On%20the%20Theory%20of%20Scales%20of%20Measurement.pdf
Thanks for all the helpful--and intriguing comments.
Reginald
I have a survey data where ratings have been given between -5 to 5.
5 means: a practice increases water level by 100%
4 means: a practice increases water level by 80%
3 means: ,, ,, ,, by 60%
2 means: ,, ,, ,, by 40%
1 means: ,, ,, ,, by 20%
0 means: no change
-1 means: decrease by 20%
and so on till -5 and its corresponding decline of -100%
I have read that ordinal data is the one where preferences are ordered , but difference between the preferences is not quantified.However, in interval data, we know the exact different between the scales (quantified). Like in temperature or time.
In my data case, I know or perhaps have assumed the quantified difference between different levels.So there is a defined uniformity across the scales. Is it still an ordinal data , or can I consider it as an interval data? I need to make computations like mean and standard deviation, thus this preliminary clarity is very important. Please guide.
You have quantitative data. Your ratings are simply a transformation of a percent-change (what is quantitative data).
However, having quantitative data is a neccesary condition to calculate mean and standard deviation, but it is not a sufficient condition to make these statistics interpretable (at least not easily) or useful. You must investigate the distribution of your rating. If is is unimodal and symmetric, mean and standard deviation can be useful statistics, otherwise they can be misleading. Example (somewhat extreme, to stress the point): if half of your respondents report 100% increase and the other half report 100% decrease, the mean (expected change) is 0%. Althogh this is correct, reporting a mean of 0% change might imply that this is a "typical response", what it is clearly not.
If the maximum possible increase is limited at 100%, I would assume a beta-distribution of the response (if you transform it to the interval (0, 1) instead of (-5,+5).
If I transform using min-max normalization technique, to get values between 0 and 1, and assume it to be a beta distribution, can I compute mean, standard deviation, multiplication or division?
@Jochen Sir, as you said that this is a quantitative data, can I apply exploratory analysis like PCA on this ? Usually for categorical data, multi-correspondence analysis, is used, however, I am just wondering whether I can directly apply PCA on a data set such as mine (provided no non-linearity exists)? My purpose is not dimension reduction, but to compute an index.
@Anjali,
if you have numerical values, you can calculate anything with them. The question is how you would interpret the calculated values. For distributions with a bit more complicated or asymmetric shapes than the normal distribution, the interpretation of the calculated values can become tricky.
I don't see why a PCA should not be applicable.
@Jochen Sir, my approach is as follows:
First I am calculating mean response of -5 to 5 distribution for every technology and across each of the three indicators. Then I am running PCA to arrive at a composite value, say an index of water efficiency. I tested KMO, Bartlett sphericity and non linearity, my data set passed through all tests. However, the big debate on applying PCA on categorical rating data (though I have quantified understanding of my ratings) , is making me a little skeptical. Thus wanted to seek reassurance from experts on research-gate.
Is a Likert-type scale ordinal or interval data?
Agreed there are different schools of thought treating Likert scaled data as ordinal or interval data type. Suggesting researchers to cite literature & justify why they treat their data collected as ordinal or interval data.
Likert scale is "naturally" an ordinal data, and it is a non-parametric data. Data of Likert scale is naturally non-normally distributed. it is because most of the time we can't get a normal curve - bell shape- it can happen only when most of the respondents in a sample choose the middle scale 3 (unsure or undecided). I mean in most cases the respondents would answer strongly agree, agree, disagree and strongly disagree. Therefore if we use a parametric data analysis to calculate the mean score for it, we will most probably getting a Mean score of 3 or near to 3. How ever, to make a decision that the respondents are unsure/undecided in their opinion is incorrect because most of them are either agree or non-agree with the statement, athough the mean score is 3.
I give you an example, if in a population, nearly half of the voters are supporters of Trump and the other half are supporters of Clinton, ond only a few are unsure/undecided, in a Likert scale statement "do you support Trump as your president?"', the mean score would be nearly 3. The shape of the frequency graph is in v shape and not in bell shape. So with the mean score of 3, can we make a decision that most of the voters are unsure or undecided? The decision is totally wrong if in this case we treat the Likert as an interval scale.
Because Likert scale data is ordinal and non-parametric data, to calculate the mean score is incorrect. Most of the time it would give us misleading results. This is because the labels 1, 2, 3, 4 and 5 are not values, so it is incorrect to analyse the data using the parametric way, by comparing the mean score.
Likert scale is not an interval or ratio scale because both interval and ratio scales are "naturally" parametric data. These two data are parametric in nature because for a group of respondents to respond on a measurement (e.g. Mathematics test), most of them would score averagely, only a few will have an extremely high or low scores, no matter how hard or how easy the test is. In this case, the data has a bell shape and is normally distributed, so the respondents' scores can be calculated and represented by the mean score, which is the average score for the group. Because Likert scale is not normally distributed in nature, and the scale can not be represented by the mean score, it is not an interval scale.
To correctly present the Likert scale data, most of the time we need to use the non-parametric way, by calculating the frequency and percetage of each of it scales, can compared between groups using Chi-square tests.
However, in many journal articles, the data of Likert scale was analysed and compared based on the mean scores using the parametric tests (e.g. T-test and ANOVA tests. A lot of researchers are using this practically, but theoretically it is a wrong practice. I am sorry to mention this.
@Jochen Sir, could you suggest some non-linearity test for my data type, before I apply PCA?
I am not a fan of testing assumptions as a requirement for other tests or procedures. To my opinion, you should have rational arguments why an assumption is reasonable or not. If you don't have that, your research is not well "settled" and the actual research should be some levels lower (understanding the data itself, before using it in further sophisticated analyses). I know that this is not the typical way most researchers follow, but that is my opinion how it should be done.
Hi Anjali,
I am not sure what kind of data you have, what are the variables involved, and what is the objective of in your study. So I can't make any suggestion.
There are a lot of not-linearity test, for correlation you have tests like Spearman rho (Ordinal-Ordinal), Cramer V (nominal-nominal), two types of Chi-square tests (nominal and ordinal), for regression, you have Logistic regression etc.
For choosing a test, the measurement scale of the data should meets the requirement of the test.
Hi Basheer,
can you explain your opinion? Are the scale 1, 2,3 etc for the Likert type scale label or exact value for strongly disagree, etc? Can we multiple or divide strongly disagree with number to get a mean score?
Having read most of the contributions. And that some are extremely reliable and once more like those of JOchen. I must comment on the following.
The Likert scale, developed by Rensis Likert to evaluate the attitude of the subject to certain stimuli.
Therefore, the number only represents a state of the attitude valued upwards or downwards according to the meaning of the item,
The obtaining of a statistical value, is only referential,
Notwithstanding the above, Psychometrics. Has developed some formulation that allows the final number to be understood as an indicator based on attitudes.
Remember that the statistic is based on DISTANCES, so the question I ask is: is the distance that separates 1 from 2, is the same that separates 2 from 3 ?. When 1 means Totally in Disagreement, 2 in Disagreement and 3 sometimes.
And these are answers to the question "would you steal, if you are hungry?
I am deeply drawn to the fact that people use this scale for something else. It is as if we used the unit of measurement meter, to measure the atmospheric pressure.
FINALLY, It is not interval
Dear Carlos,
Thanks for insisted that the Likert scale is not Interval as what was initiated by Rensit that the distance among the scales are unidentical.
There are some scholars who use softwares like Rarsh Model tool to "transform" the oordinal Likert scale into interval, and then using the parametric test to analyse the scale. Transforming a less accurate ordinal scale into a more accurate interval scale would make some Type II error, which in turn making the results of analysis less accurate.
Dear Yan.
You're absolutely right in that respect, (I work with Rasch models). And as you say, that type of variables are transformed. But it is product of the procedure and not something with intention.
Regarding my answer, many times the people that consult, only expect a simple answer, and some of us, we go beyond that simple answer.
greetings
Just a lateral comment on Likert's original paper from (1932). Today, it won't be published anywhere. Not because what he said was incorrect, but because the examples he used. One of the examples seems to be contemporary as he discusses how to check the attitude of a group of people to USA internationalism. The other is more difficult to accept today. It discusses attitudes towards African-American people, using words totally unacceptable today.
Dear Reginald
The Likert data is ordinal, but there are some other ideas, please see the following link:
http://www.theanalysisfactor.com/can-likert-scale-data-ever-be-continuous/
Good Luck
Here is the correct formula for the unweighted mean of a sequence of Likert scale values. The values are all standardized so that they lie in the range from -1 to +1.
There is of course a simple generalization to a weighted Likert mean. I guess you can figure out yourself what has to be changed.
Very long discussion on this, really Very helpful.
Direct answer is that the likert scale value is ordinal as suggested by many researchers above.
Although in many research paper of reputed journals I have seen that people is quantifying the likert scale values by adding the various questions value of likert scale and then making range of the categories of the final variable which comes by adding the likert scale value of all the questions.
One thing is sure that we can quantify it, now how much this technique is correct is still debatable issue,
but we can never say that likert scale is interval or ratio scale.
@Jochen Wilhelm : This is a by-product of my project on Peer Assessment Scoring in higher education. I'll give a rather short answer, but the story or rationale behind it is really fascinating and deserves a fuller explanation. It goes back to the concept of quasi-means, aka Kolmogorov-Nagumo mean (1930).
I added to it the principle that a mean should be based on a well-defined addition and (scalar) multiplication operation on the range (set) of numbers that you permit. In particular: both operations should always lead to a result, i.e. a sum or product, that again belongs to that range. Examples:
Now what about numbers which belong to the standardized Likert scale ranging from -1 to +1? The usual arithmetic addition and (scalar) multiplication will NOT do, because the Likert range is strictly bounded.
However, there is a simple alternative, which even allows the definition of subtraction on the Likert scale! Addition shall be redefined as: s (+) s' = (s + s') / (1 + s * s'). Here, (+) is de new addition operator. It is now also possible to define scalar multiplication based on this addition operator. (I will not do that here, but I guess you will be able find it out.) The final step is to define the Likert mean in exactly the same way as before (see examples) as the (+)-sum of a sequence of numbers, scalar-multiplied by 1/n (or, equivalently, using weights). The derivation is a bit tricky, but once you have the hang of it, it is quite straightforward.
By the way: the same procedure can be followed for numbers from the unit interval, e.g. if your measurements happen to be fuzzy values or percentages. Therefore, my advice is also, never to use the arithmetic mean or geometric mean of e.g. scores or percentages or grades, but always this modified quasi-mean.
P.S.: I am not dogmatic about "scale types". In the applications that I am interested in (human judgment), I see/have no problem taking Likert values "as is" and perform the arithmetical operations I just explained. On the contrary: they allow me to represent exactly the behavior that I would expect near the borders of the interval. Think about that!
Link: https://www.researchgate.net/project/Peerwise-Assessment-Scoring-Systems-acronym-PASS
The level of measurement of likert scale is ordinal. Using the average of the scores might be useful in some conditions.
The likert scale of measurement is ordinal lying in an interval.
Likert scales is ordinal or interval in nature is a subject of much debate. Some people argue that a Likert scale is ordinal in nature. They correctly point out that one cannot assume that all pairs of adjacent levels are equidistant (of the same distance). Nonetheless, Likert scales (and a few other scales, that is, the semantic differential scale and the numerical scale are generally treated as if they were interval scales, because it allows researchers to calculate averages and standard deviations and to apply other, more advanced statistical techniques (for instance, to test hypotheses).The interval scale (note that a Likert scale – formally an ordinal scale) is used when responses to various items that measure a variable can be tapped on a five-point (or seven-point or any other number of points) scale, which can thereafter be summed across the items (Uma Sekaran & Roger Bougie, 2016).