One of the most frequently asked topics I come across is how to create simple scales from Likert-scored items. Apparently there are a lot of beginning researchers who have learned how to use Likert-style items in questionnaires, but never got any advice about how to analyze the resulting data.
Hence, I would like to put together a thread here that people can refer to whenever this question gets asked.
Please remember that this is advice for beginning researchers who generally want to run basic regressions, so there is no point in recommending complex procedures such as Item Response theory or Structural Equation Modeling.
Instead, I would to find sources for things like how to use Cronbach's alpha effectively and how to do the most straightforward kind of Factor Analyses (almost certainly Exploratory rather than Confirmatory).
The overall goal is to help beginners get started with scaling, so who can suggest some resources?
The official SPSS manuals offer suggestions for applying the analysis of validity of scales, as well as for exploratory factor analysis.
http://www.unt.edu/rss/class/Jon/SPSS_SC/Manuals/SPSS_Manuals.htm
The most relevant sections in the SPSS manual can be found on that webpage under "Statistics Base 18" where you will find the descriptions of Factor Analysis and Reliabilities (Cronbach's alpha). These are both rather abbreviated descriptions on the relevant commands
Here are links to two discussions with more detail about how to use SPSS for Cronbach's alpha. This first one is for the basic situation where all your variables are scored in the "same direction" i.e. you have only positive correlations:
https://statistics.laerd.com/spss-tutorials/cronbachs-alpha-using-spss-statistics.php
This second one covers the case where some items need to recoded because they are scored in the opposite direction:
http://psych.hanover.edu/classes/ResearchMethods/Assignments/reliability-1.html
(Note that it shows how to do the necessary recoding using SPSS syntax, which can be helpful but is not necessary.)
As far as books go, Devellis' "Scale Development: Theory and Applications" is a useful introduction to both the conceptual and practical aspects of scale construction, although it covers considerably more than one would need to construct a single scale for a specific purpose.
Can anyone supply a link on factor analysis for the simplest case, where you want to determine whether all your items can be accounted for by a single factor?
The Andy Field's book "Discovering Statistics using IBM SPSS ..." is one of the simple and useful books that cover the topics.
Pallant, 2007 and 2013 on step by step guide in analysis of data using SPSS would be of help.
The Plaint book does indeed have good chapters on both Reliability and Factor Analysis. I found an online copy at:
http://www.academia.dk/BiologiskAntropologi/Epidemiologi/PDF/SPSS_Survival_Manual_Ver12.pdf
The presentation on Factor Analysis is particularly thorough, although perhaps a bit advanced for an absolute beginner.
In addition to Andy Field's book (Discovering Statistics using IBM SPSS) Malhohotra and Dash's book (Marketing Research : an Applied Orientation) can also be useful
For books to purchase, I like Norusis' "IBM SPSS Statistics 19 Statistical Procedures Companion." It assumes a basic knowledge of reliability and factor analysis, and then provides very useful "worked examples."
Here are some sources that are accessible to beginners and that include realistic examples.
http://www.statisticshell.com/docs/factor.pdf
https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/
Article Making Sense of Cronbach's Alpha
[Updated 3/19]
Hi, it is a very interesting discussion.
But, what is the answer for analysis of scales from items using Likert scores (adding or averaging the scores)?
Is the ordinal logistic regression the right approach?
what if the proportional odds assumption is not fulfilled?
what kind of regression is the most appropriate?
Thank you.
Most of the people asking about creating this kind of scale are statistical beginners and would not be able to deal with a technique as complex as ordinal logistic regression, so the goal is to convert the Likert scale items into a single interval variable. This can be done by either adding or averaging, because averaging just divides by a constant (the number of items) so it only affects the "metric" in which the answers are expressed.
Here is another high quality introduction to factor analysis:
http://www.tqmp.org/Content/vol09-2/p079/p079.pdf
My main complaint against almost everything citation I have posted, however, is that they simply use the default specifications to define their models, which in SPSS means Principal Components with Varimax rotation. There are any number of reasons for believing that those assumptions are unreasonable for most scale construction in the social sciences, but the main thing I would emphasize is the need to allow for correlated factors via an oblique rotation such as Oblimin.
Back in the early days of developing intelligence tests, it may have made sense to search for uncorrelated (orthogonal) factors, but that logic seldom applies outside the most extreme cases of psychometric work. In particular, when someone is specifying a model with two or more factors, the conceptual or theoretical names they give these factors almost universally imply that the underlying constructs should be correlated.
The problem of assuming that your factors are uncorrelated (i.e., using the default to Varimax rotation) will become obvious if you simply sum up the items to create each scale, and then run a correlation. That correlation will almost always be non-zero and often it will be substantial.
Trying to get around this by using factor scores rather simply summing the items is at best a partial solution, and I would strongly recommend checking the correlation between the factors computed using factor scores. But the real problem is trying to force your factors to be uncorrelated when the concepts they are supposed to measure are quite likely to be related in any reasonable theory.
So, rather than slavishly using the default specifications, I would recommend stating with a true factor analysis such as Maximum Likelihood, rather that Principal Components, which is basically a special case. And I would allow the factors to be correlated, using an oblique rotation, such as Oblimin.
Dear Dr. Morgan.
About factor analysis, is it possible to have factor loadings over 1.0 after rotations?
What was the problem under that situation?
Thank you.
In classical true-score theory, a loading of over 1.0 implies that there is a negative amount of error in your variable, which is impossible.
If you have this problem, the first thing I would recommend is to examine the original correlation matrix.
Here is another good introduction to Exploratory Factor Analysis. I suppose one reason I like it is that it agrees with my own preferences: use maximum likelihood factor analysis with an oblique rotation and a sample size of at least 10 observations per item.
Many students struggle with analyzing raw Likert data and constructing a reliable scale with it, in order to use in correlations and regressions meaningful to their research.
Since I have given the same advice over and over to colleagues - and I followed the very same steps repeatedly, I wrote a small R package, targeted at the non-R end-user, that wraps-up the great functions from the psych package, and deals with most aspects of data preparation and coefficient extraction that most people do not want to mess up with.
I tried to combine methodological soundness with ease of use, so I did not permit for Pearson correlations between the ordinal raw items, only Spearman or polychoric. It does not allow for PCA either, only factor analysis extraction methods. It reports Cronbach Alpha and item loadings to a single factor, suggesting items for deletion when appropriate. Experienced users can use it rather flexibly, while novice users can actually perform item analysis and select items with no more than three self-explanatory commands.
The package can be used from within R, with install.packages(Scale) and library(Scale). An introductory tutorial and a worked example can be found in my profile in Research Gate.
Feel free to report issues and recommend functionality to be integrated in future versions.
A source I have used regularly is the book 'Scaling: A sourcebook for behavioral scientists', edited by Gary M. Maranell. This book, although relatively old, is suitable for any researcher who thinks about designing questionnaire or interviews. Statistical analyses are included as well.
As I want to contribute, I’m going to try to keep away as much as possible from the problems I find at the heart of almost all uses of Likert-type items. To that end, I’m going to start with advice or suggestions on Likert-type analysis by those who view them as valuable (even invaluable) tools.
1) Part of data analysis is understanding your data. I don’t mean what “type” it is (e.g., categorical, ordinal, ratio, etc.), nor do I mean tests for normality or skew or careful examination of plots. Research questions and design are theory-laden, and exceptions in this case mean using data from Likert-type responses without having designed the means to generate these responses in accordance with accepted practices (by using the plural, I mean to indicate that there are differing and conflicting ideas about what is or isn’t accepted, not that there are many accepted practices; to that end, going through the research on e.g., Item Response Theory (IRT), measurement theory, etc., is crucial). Consider one of the most commonly used research designs: significance testing/null hypothesis significance testing (NHST). By this I refer to the design in which some alpha level has to be met in order to reject the null. Despite its incredibly widespread use, it is probably unique in how thoroughly criticized it has been since before “it” existed (it’s actually a combination of two mutually exclusive approaches- Fisher’s and Pearson & Neyman that somehow became spliced together) and the almost complete lack of answers to the hundreds of published studies, monographs, even semi-popular books (e.g., The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives). Yet it dictates not simply research design but statistical analysis of data (after all, one can’t determine whether an alpha level is met by using unsupervised learning algorithms for classification or decide to reject the null based on the results of nonlinear manifold learning). The same is true with Likert-type response data.
2) The one good thing about the wide-spread use of Likert-scales/Likert-type scales is that they have been extensively studied in certain ways. While there remains little consensus on many issues, it is important to know what these issues are as well as those for which there is much more consensus. The issues include (but are not limited to) the number of responses for an item, including a so-called “middle” response (or more simply using an odd number of points), phrasing, whether or not to label only the endpoints, etc. For example:
“with Likert scales subjects are asked to respond (for example) strongly→disagree, disagree→strongly-agree to questionnaire items. Then, after summing responses from a number of questionnaire items a mean-response for the items’ factor is determined, and subjects’ attitudes are differentiated based on this mean response... In point of fact, however, each subject’s set of responses generates a probability distribution on the ordinal scale, so that by concentrating solely on the subject’s mean response researchers only differentiate among subjects based on the lowest non-trivial moment of this probability distribution.” (emphasis added)
Camparo, J. (2013). A geometrical approach to the ordinal data of Likert scaling and attitude measurements: The density matrix in psychology. Journal of Mathematical Psychology, 57(1), 29-42.
“the spacing between alternatives is not subjectively equal. A good example is the common marketing research scale of “excellent—very good—good— fair—poor.” The subjective spacing between these adjectives is quite uneven. The difference between two products rated good and very good is a much smaller difference than that between products rated fair and poor. However, in analysis we are often tempted to assigned numbers one through five to these categories and take means and perform statistics as if the assigned numbers reflected equal spacing. This is a pretense at best.” (emphases added)
Lawless, H. T., & Heymann, H. (2010). Sensory Evaluation of Food Principles and Practices (2nd Ed.) (Food Science Text Series). Springer.
3) “Many of the statistical methods routinely used in contemporary research are based on a compromise with the ideal… The ideal is represented by permutation tests, such as Fisher’s exact test or the binomial test, which yield exact, as opposed to approximate, probability values (P-values). The compromise is represented by most statistical tests in common use, such as the t and F tests, where P-values depend on unsatisfied assumptions.”
Mielke, P. W., & Berry, K. J. (2007). Permutation methods: a distance function approach (2nd Ed.). Springer.
Now, being old doesn’t mean bad by any means. But all statistical tests involve numerous assumptions and differ as to how robust they are to violations of these. For the data that are most commonly generated from Likert-type items, statistical analyses that are designed for categorical data or are most robust to non-normality or rely primarily on the structure of the data are typically superior. Examples include configural frequency analysis (CFA), multidimensional scaling, multidimensional nonlinear descriptive analysis (MUNDA), permutation tests, support vector machines (SVM), artificial neural networks (ANNs), fuzzy probability/statistics, nonlinear discriminant analysis, similarity/dissimilarity metric analyses, multiple correspondence analysis, fuzzy cluster analysis, Bayesian Item Response Models, and staying away from any tests that rely on squared deviations from the mean (t-tests, ANOVA, multiple regression, MANOVA, MANCOVA, Pearson’s product-moment correlation coefficient/Pearson’s r, etc.; note that some statistical methods that rely on mean squared deviations are less sensitive, such as PCA, and that many statistics can be made more robust by using more robust location measures such as trimmed means or m-estimators).
4) For those who are unfamiliar with linear algebra, multivariable calculus, or for whom Manhattan distance is a weird way of talking about how far away Manhattan is, or even for those familiarity with advanced mathematics is better than my own but don’t want to/have time to learn classification & clustering methods or fuzzy set theory, read Rand R. Wilcox’s Fundamentals of Modern Statistical Methods: Substantially Improving Power and Accuracy (2nd Ed.) (for those who really don’t like mathematics, at least read his Basic Statistics: Understanding Conventional Methods and Modern Insights. For those who are interested in what’s out there, I have supplied a bibliography that can be found on my page here. Unfortunately, too many undergraduate and graduate research programs have become reduced to teaching students how to associate X research question and data with Y statistical test, and how to plug the data into SPSS and run that test. It’s really, really, really, important to be familiar with the logic behind whatever quantitative methods you use and what the objections to them are.
5) A lot of the time researchers worry about whether or not they can treat Likert-type response data as ratio and similar (essential) issues regarding the analysis part of data analysis, or they worry about the issues relating to middle response bias and the design of questionnaires more generally (also essential), and forget to wonder whether or not they are actually measuring anything:
“Consider any attribute that psychometricians currently believe they are able to measure (such as any of the various intellectual abilities, personality traits or social attitudes that the textbooks mention), and ask the question, Is that attribute quantitative? The hypothesis that such an attribute is quantitative underwrites the claim to be able to measure it. However, there has never been any serious attempt within psychometrics to test such hypotheses.”
Michell, J. (2000). Normal science, pathological science and psychometrics. Theory & Psychology, 10(5), 639-667.
“In order to assign a numerical system to an empirical relational system, it was required that the empirical relations could first be identified without necessarily assigning numbers to objects within the system. It was a prior requirement that whether or not an empirical relation possesses certain properties was a matter for empirical, scientific investigation…
To assume that the manipulation of numerals that are imposed from an independent relation system can somehow discover facts about other empirical objects, constructs, or events is 'delusional' ”
Barrett, P. (2003). Beyond psychometrics: Measurement, non-quantitative structure, and applied numerics. Journal of Managerial Psychology, 18(5), 421-439.
"Studies exploring the validity of a scale can sometimes help to provide meaning to a metric, but issues of metric arbitrariness are distinct from those of reliability and validity."
Blanton, H., & Jaccard, J. (2006). Arbitrary metrics in psychology. American Psychologist, 61(1), 27.
You can design your Likert-type items according to the best practices and use the most cutting-edge, advanced, and appropriate statistics for the data type you are working with, and it can still be garbage because you simply assigned values to responses without measuring anything:
"In measurement, according to the traditional view, numbers (or numerals) are not assigned to anything. If, for example, I discover by measuring it, that my room is 5 meters long, neither the number four nor the numeral 4 is assigned to anything, any more than if I observe that the wall of my room is red, either the colour red or the word red is thereby assigned to anything. In neither case am I dealing with the assignment of one thing to another...
Measurement is the attempt to discover real numerical relations (ratios) between things (magnitudes of attributes), and not the attempt to construct conventional numerical relations where they do not otherwise exist."
Michell, J. (1999). Measurement in Psychology: A Critical History of a Methodological Concept (Ideas in Context). Cambridge University Press
@Andrew Messing I found your piece very informative. Nonetheless, I have to point out two issues.
Likert Scales are Equal-Interval Scales:
Rensis Likert devised the eponymous method for measurement of attitudes. A Likert scale is an equal-interval scale, as long as a number of items are repeated measures of equal intensity statements. This requirement is assessed with reliability and validity procedures. The method makes no assumption of the spacing between points being equal. Rather, the scale is equal-interval due to the multitude of statements. Talking of isolated items, calling them Likert scales (when they are only 5-point response formats) summarizing it with means and applying traditional statistics is thoughtless, as you say.
Testing "Measurability of Concepts" is Non-Sensical:
Michels expands a discussion on the inherent measurability of skills and attitudes. A brief attempt to address this: An equal interval scale has the objective of scaling individuals. Be it intelligence, ability, personality traits or attitudes, measurement is laden by the scientific theory behind the construction of the instrument of measurement, as this holds true for temperature, color and light.(I owe this insight to the attached paper by Marion Aftanas.)
Now, all constructs that traditional psychometrics attempts to measure, have being criticized for lacking explanatory adequacy and/or being influenced by layman beliefs. This argument is not in support of the existence and validity of intelligence, personality traits or attitudes, rather than a reminder of the connection between the objects under measurement, their mathematics, their measurement, and the structured beliefs that led to them. For example lots of mathematical psychology was developed in the eugenics context, and even the idea of the normal distribution is connected to eugenics beliefs about the distribution of (high) intelligence and (favorable) personality traits.
Questioning the measurability of "psychometric concepts" in general, and their "inherent" ability to be measured or not, seems to be an ill-formed question. Especially the part about scientists not "testing the hypothesis" of measurability of the concepts seems to be inapplicable, and I am not aware of such "hypothesis testing" to have happened in other scientific disciplines. Proton acceleration could seem a very obscure idea, and its actual measurement requires tons of theory on mechanics, electricity and magnetism, but people measure it successfully with theory laden mechanisms.
Similarly, constructing psychometric scales is informed by theories and empirical findings - For example a) researchers questioned early work on Adorno's authoritarianism scale, because they found that people can be Yeah-sayers or Neh-sayers, which led to the inclusion of reverse items (see my reply and references to a relevant discussion in the link), and b) fake-good and fake-good scales are not only introduced but also reflected upon in personality assessment, and are also informed by other psychological effects, such as the "better-than-average" effect (see link). Taken together with the specific investigation of Likert responses you cite, these two examples show that scale construction is theory laden and empirically informed in a healthy fashion, radically different than the "pathological" picture Michels illustrates.
http://digitalcommons.unl.edu/cgi/viewcontent.cgi?article=1011&context=leadershipfacpub
https://www.researchgate.net/post/In_scale_construction_is_reverse-scoring_some_items_desirable_or_not
Article Standard systems: The foundational element of measurement theory
Over and over again on RG, I read questions from people who either have already or are about to collect data using Likert Scoring, but who lack the basic knowledge to analyze that data. The point of this collection is to help those researchers understand the standard practices in most social science fields.
Someday, that might all change, but for now, I quite explicitly choose the word "simple" in the title of this positing.
Dear Nikolaos Giallousis:
Alas, I don't think this is the place to discuss the issues I raised (which were designed more as a guide to approaching Likert-scales or the issues involved than either 1) my opinion or 2) a consistent elucidation of the nature of Likert-type data. I would be extremely grateful if you would kindly repost your reply as a response to a question I asked about Likert-scales/Likert-type response data here: https://www.researchgate.net/post/Likert_Language_Linguistics_and_Loss_How_do_we_justify_the_use_of_Likert-Type_Response_Data/1
Thanks!
On another note:
A simple (although I suppose that's relative) method for constructing and analyzing Likert-scales (see attached).
Thanks David for starting this thread to benefit beginners. Would you please suggest a path-way to delve more into analysis of surveys? Right direction and resources which present in-depth concepts with practical problems in Simple words can help beginners a lot.
Thanks again!
David -- a brilliant idea starting this thread, and one that I will tend to visit as these types of questions come up for me in my research all the time. I encourage others to contribute to the thread. An excellent resource, at least for me, is
Discovering Statistics Using SPSS by Andy Field.
The book is very easy to read, explains statistics in very simple concepts, uses very good examples, and shows how to run each statistical procedure in detail in SPSS. Also, it is quite funny, believe it or not.
The chapter on factor analysis is also excellent (that is how I learned to do factor analysis), with a realistic example. But it is not a website, so I was unable to fully meet your challenge.
I think a simple intro is to questionnaire design and analyses is
Doing your research project Judith Bell Stephen Waters A guide for first-time researchers
Dear David,
The book of Andy Field has already been pointed out. So, I would just point it out again as valuable and very useful resource. As Steven Dukeshire, I also find the book very accessible and the practical examples are just great (the sense of humours is also great). You can do a preview of the book at the Sage website: https://uk.sagepub.com/en-gb/eur/discovering-statistics-using-ibm-spss-statistics/book238032#reviews
The reference is:
Field, A. (2013). Discovering statistics using SPSS (4th ed.). London: Sage.
Another book worth looking at is: Basic Statistics for the Behavioral Sciences by Gary Heiman: http://www.amazon.com/Basic-Statistics-Behavioral-Sciences-6th/dp/0840031432 This book
This book provides as well SPSS examples, but I find Andy Field’s examples much more practical and accessible (at least, from a beginner’s perspective).
Thank you for the sharing.
Best,
Joana Almeida
David, great idea starting this thread.
In the world of health/medical research, Streiner & Norman's book (link below) is quite well known and quite popular. Chapter 7 (From items to scales) seems particularly relevant to this thread. HTH.
http://www.oxfordscholarship.com/view/10.1093/acprof:oso/9780199231881.001.0001/acprof-9780199231881
There are a few good books out there for beginners that I like (general and health-related).
Scale Development (2nd Edition) by Robert DeVellis
Measuring Health (2nd Edition) by Ian McDowell and Claire Newell
Making Sense of Factor Analysis, by Marorie Pett, Nancy Lackey and John Sullivan
Ariel
This is a wonderful thread. I have found great success in creating a Likert Survey using QuestionPro to collect Q-Sorts for Q-Methodology. I use the PQMethod and Pearson Product Moment (SPSS) for obtaining correlations. The data captures also work nicely for calculating Z-scores or T-scores and for creating bell curves. So many options depending on the structure of the type of study. I have found all of these useful. You can Google the info provided here to find out more information and YouTube has great tutorials on how to use them. QuestionPro is free to student researchers but there is a fee for non-students or commercial use. While there are some limits in what you can do in QuesionPro, but the ease of creating and deploying surveys is excellent. Definately better than Survey Monkey. Hope this info is useful.
Thanks David for your wonderful starting this thread. I learned a lot.
A very fine description, Andrew, of the misunderstandings that plague the teaching of research and analytical procedures in the social and behavioral sciences generally. Just because we can assign numbers to various levels of a phenomenon does not mean that the phenomenon is really quantitative or that we can interpret those numbers as containing the information that would be conveyed by actual numbers.
It's obviously an error to take, say, data that code a person's hair color into a category (1=blond, 2=brunette, 3=redhead, etc.) and use it to compute "average hair color" ("1.8=dark with blond streaks"?). But of course a program like SPSS will cheerfully calculate it for you, without asking questions (and you might be amazed at the number of students who are perfectly comfortable presenting a finding like this in their homework.)
Less obvious are the errors perpetrated in Rensis Likert's name those who build "scales" with presumed numerical properties out of a series of numbers attached to the variables but lacking these properties. The idea is presumably that if you add up enough bad numbers, at some point they will be transformed into good numbers. It may well be true that if you have a whole series of variables with wild distributions and add them together to form a new variable, it will probably have a more "normal" distribution as the different variations from normality cancel each other out. But this doesn't mean that the variable has somehow been magically transformed into something mathematically meaningful.
Statistics can be a very valuable tool for protecting the analyst against certain forms of inferential error, particularly the tendency to attribute causality where none exists. But will never create meaning in data where no meaning exists.
Indeed its really helpful thread. I am sharing some likert scales developed by Vagias, Wade M. (2006). Likert-type scale response anchors. Clemson International Institute for Tourism & Research Development, Department of Parks, Recreation and Tourism Management. Clemson University.
for further study
Parag Arun Narkhede , I presume that these scales have been tested in some ways that determine they yield usefully distributed variables?
the type of data is also crucial here..whether it is likert scale data or likert type data.
You may have a look at this
Article Analyzing and Interpreting Data From Likert-Type Scales
Dear all,
Perhpas you could try James Gaskin ( https://www.youtube.com/.../UCOMWLcopuV4xj8U3dePhVlQ ) . I also find it very easy and veru usefull.
have a nice time
Helena
kindly help me. I used 7 point likert scale 1 to 7. 1 means strongly disagree and 7 strongly agree..I have 5 items in my scale..
What will be the interpretation of the scores per item..?
@Bubbles Intal. I believe it depends on the kind of constructs you're using to conceptualize your study. Are those 7 items supposed to measure one construct (or a few)? If yes, I would get the construct totals based on responses to each item so that you can interpret the construct score rather than ratings for individual items.
It depends on what kind of analysis you are running, such as using the scale as an independent or dependent variable in a regression, where a standardizes coefficient might be useful. In general, you could add up the 5 individual items and divide the sum by 5, so that the result will be scaled from 1 to 7.
@David L Morgan... what will be the descriptive then? Examples: 7- high.. 6-5- moderate.. 4-3- low- 2-1 very low ? is that the descriptive interpretation after i got their scores ?
@Chamil Rathnayake.. thank you very much.. yes, it measure one construct.. what would be the interpretation then for the scores?
@Bubbles Intal, I usually do what David suggested above. If it's only one construct, get the construct average. Then interpret it as if you're interpreting a 1-7 scale (or 1-5 if you use a five-point scale). Deciding whether a score is high,moderate, or low is up for discussion.
Chamil and David, how do you determine the likert scale to be use. I have a similar doubt and I am yet to find an answer. I also constructed a scale through PCA. But my items were in different scales but it was standardlized. My problem now is how to adapt a new scale for the new construct, given the 4 items I had used were on different scales. My scores range from -2 to 2. How do I interpret this?
You should standardize each item before adding them together, then divide by four. The scale will then be in standard deviations as units.
If the above recommendations are acceptable then using the Rasch Rating Scale Model, must be a shoe-in. It merely instantiates all the things you think should happen in the Likert type case.
In my opinion, Professor David L Morgan is right that this problem is more complex than it usually seems that it is not enough to use popular models for typical situations.
Hi David L Morgan ,
In determining the reliability of your score through Cronbach's , what will inform you decision as to whether to use either
1. just the one item with the highest reliability to represent the total score OR
2. An average of the four items to represent the total score
The supportive argument for 1. i understand is that , just one score with the highest is enough to represent the total and summing or averaging several items only increases errors through the addition of the individual errors of each item
I have seem a discussion of this somewhere on this platform that was inconclusive !
Great idea David in combining information in one location. I will share these conversations with my research methods students as I can see this being useful for them as well.
The link to cronbachs alpha value is no longer avaliable (error 404) is there a more up to date page anyone could advise ?
https://www.researchgate.net/deref/http%3A%2F%2Fdata.library.virginia.edu%2Fusing-and-interpreting-cronbachs-alpha%2F
Greetings, I was wondering if anyone has any links to papers that can be used as formal references for scale design that focus on their 'structure'.
Most of the survey design chapters in research methods texts focus on wording and phraseology in item design. While this important, I am not finding enough guidance on the 'structure' of scales for optimal subsequent analysis.
Also any links or suggestions on the following
- pros and cons of using scales from the literature?
- assumptions when mixing and matching scales and items from the literature?
-recommended practice when adapting items from other surveys to make them more applicable to ones own study?
- advice for constructing scales for newbies - how many items? How come some studies use single-item measures if a minimum of three is recommended for factor analysis?
Hello Nick,
First up, I'm not sure whether "structure" is the word you need for the issues you are facing. Perhaps "construction" or "creation" would be better. But that's a bit by the by. :-)
Here are a few quick things (not comprehensive or exhaustive by any means) that occur to me:
Please be careful with using items from other instruments in case there are copyright issues.
Consider carefully whether the scales you might use are really suitable for your own research objectives. This relates to your question concerning adapting items to make them more appropriate for your own research. I think that often people use existing instruments because they are too lazy to don't have the skills or time to create their own instruments.
People also often use existing scales in their existing formats because they can then compare their results with results from other researchers' results. However, if, as a researcher, you are barking up the wrong tree from the start, I think that comparison with other research is somewhat misguided, even vacuous.
If you are adapting items from other instruments, I recommend you do that with care, possibly asking knowledgeable people in your field whether the new items are satisfactory or not. Other people's views and insights are often valuable.
My notion is that many constructs have a number of facets and that therefore a single item is not likely to capture those facets adequately. Therefore, I prefer to see scales with 4 or more items in them (preferably quite a few more than 4 - sorry I know that's vague) and that the final set of items is carefully chosen as a result of procedures that include factor analysis.
If you use only one item, the data you work with are likely to have limited variance - which could compromise the validity of conclusions you can draw from your data.
I hope the above is helpful, even if a lot more could be said.
Robt.
Thanks Robert. In terms of using items from others peoples scales I was working off the guidance of Pallant (2013) who says "scales, which have been published in their entirety in journal articles, are considered to be ‘in the public domain’, meaning that they can be used by researchers without charge. It is very important, however, to properly acknowledge each of the scales you use, giving full reference details. In terms of number of items, others have said 'a minimum of 3' but I did wonder as if you need to trim any items that will leave you with a maximum of 2 in the final scale.
Not at all, Nick. I think that Julie Pallant's advice is good (and that's not because she's a fellow Australian; I respect her). Perhaps always do a bit of checking, however. For example, items from the Warwick-Edinburgh Mental Well-Being Scale (which I also respect) might be readily available in journal articles, but the people who developed that scale ask that those who use it provide their descriptive data back to the original researchers.
As I wrote in my previous post, I think that most constructs are multifaceted. If your construct is something such as "How many times did you eat yoghurt yesterday", a single question/item would do the trick. However, if you are interested in researching more complex constructs, my hunch is that at least four items are needed to tap them.
Maybe I have misunderstood you, but perhaps it would help if you indicated why you seem to want to reduce scales to only two items.
And sorry for one of my sentences in my previous post not making sense. It should have read " . . . too lazy to do anything other than take a scale off the shelf without considering its suitability or don't have the skills . . .".
Robt.
thanks Robert, in terms of numbers of items I guess its balancing having enough for the reasons mentioned above with having an overly-long questionnaire. The model I want to test has seven independent (exogenous) constructs and two dependent (endogenous) constructs. So if I select an average of 5 items per construct that's 45 questions. That's quite a lot of questions for the participants to get through.
Thanks for the extra information, Nick. Now I get it: You have a number of separate constructs and don't want to overwhelm your participants. My responses below could be off-beam, so please forgive me if so.
First, I wonder whether there is a risk of investigating many things in a half-baked way when you could have investigated fewer things more effectively.
Second, I'm not sure what stats you plan to use, but there might be problems further down the track if you need to make Bonferroni or similar types of adjustments to p values in order to avoid Type 1 errors.
Third, I assume that some of your variables are related to each other, so I wonder whether you could do some pilot work involving factor analysis (you'd need a sufficient number of participants, which could be a problem) in order to identify whether there was construct overlap that could be avoided.
Fourth, if you decide that you really do need to ask 45 questions, it would be a good idea to design your instrument so that it was not unappealing to participants in terms of such things as format, instructions, item wording, and choice of response options.
That's all I can think of for the moment.
Robt.
I would like to return to an issue covered at the top of this thread in respect to creating single variables from several items in a Likert scale suitable for regression analysis. If understand him correctly David L Morgan advocates meaning the items to create a score. However others advocate just adding them up to create the single variable https://www.youtube.com/watch?v=3OqcRDE5PCs
Are there any arguments for or against either approach or doesn't it matter?
Hello (again) Nick,
In essence (psychometrically speaking), I think it is pretty much six of one, half dozen the other: It doesn't really matter.
However, getting a mean rather than a total score might be better under some circumstances. For example, if you have missing data on one of your Likert items, obtaining a mean on the available items would yield a more valid "final" score as long as the denominator is the number of items with responses rather than the total number of items.
You would need to think through how many items with missing data would be permissible under those circumstances - though that would also apply if you were simply adding up the responses.
One other thing to bear in mind is that obtaining the mean of the items permits a researcher to get a sense of (roughly) where on the original Likert response scale the responses tend to land.
Of course, reverse coding might need to occur with negatively worded items.
Robt.
I recommend the website "The Analysis Factor' under the link below:
https://www.theanalysisfactor.com/
It explores a variety of simple statistical techniques in a very clear and understandable way. For researchers based in low and middle income countries, where it might be difficult to get books, this is a useful and usable alternative, which is very helpful.
All that calculating a mean does is to return the score to its original metric, e. g. more or less 1 to 5 for a 5-point interval. As always, dividing by a constant has no effect in correlation or regression.
Hi,
I would like to test an hypothesis to see for instance what male and female scored on a linkert scale question which measure the variable intention to see if the gender influences it. What kind of analysis should I do? (I am using SPSS)
Thank you
Hello Gabriele Carollo. It depends who you ask! Some statistical liberals (if I may call them that) have very few (if any) qualms about treating single Likert-type items as if they have interval scale properties. Relying on robustness, they are happy to use means and SDs for description, and to use t-tests, ANOVA, linear regression etc. Here is one commentary that represents this liberal view:
Article LIkert scales, levels of measurement adn the “laws” of statistics
A more conservative view is that a Likert scale (i.e., the mean or sum of several Likert-type items intended to measure the same thing) may have approximate interval scale properties, and so can be analyzed with the usual parametric procedures. But single Likert-type items must be analyzed using methods for ordinal variables (particularly if the number of categories is the usual 5-7).
For the example you give, one who takes this more conservative view might use the ordinal chi-square test David Howell describes on this web-page (aka., the test of Linear-by-Linear association in the output from CROSSTABS in SPSS):
Now for a bit of irony: Notice on that page that the ordinal Chi-square is a function of the r2 value between the two variables: Ordinal chi-square = (N-1)r2. I think that many statistical conservatives would argue that r2 should only be computed when both variables have (approximate) interval scale properties. And yet, here is r2 as a major component of the ordinal chi-square statistic. ;-)
Personally, for single Likert-type items, I'm more comfortable with methods that treat those items as ordinal variables. I think this leaves one less susceptible to strong objections from reviewers, editors, professors and other potential critics. YMMV. (https://dictionary.cambridge.org/dictionary/english/ymmv)
HTH.
Here is a link to a somewhat more advanced article that presents evidence in favor of using parametric statistics (Pearson's correlations) with Likert-scored items.
Article LIkert scales, levels of measurement adn the “laws” of statistics
But see also this article:
Liddell TM, Kruschke JK. Analyzing ordinal data with metric models: What could possibly go wrong? Journal of Experimental Social Psychology. 2018 Nov 1;79:328-48.
https://osf.io/9h3et/download?format=pdf (draft manuscript freely available to all) https://www.sciencedirect.com/scienc...22103117307746 (final published article; requires institutional access)
EDITED 17-May-2019
Notice that Liddell & Kruschke do cite the Norman (2010) article (and others) on page 337. Here is the relevant excerpt (with emphasis added).
"A variety of previous investigators have examined false alarm rates in metric analyses of ordinal data (e.g., Boneau, 1960; Glass, Peckham, & Sanders, 1972; Havlicek & Peterson, 1976; Heeren & D’Agostino, 1987; Hsu & Feldt, 1969; Norman, 2010; Pearson, 1931). In general, they found false alarm rates not to be badly inflated. However, this body of work did not investigate circumstances we have highlighted that do produce false alarms. For example, cases with unequal variability across groups and means closer to one or the other end of the ordinal scale (such as A and B in Fig. 4) were not investigated in these papers, but we have demonstrated such cases do produce inflated false alarm rates."
Dear research community,
In view of the fact that a likert scale has property of an ordinal scale and each interval on likert scale are different and hence can not be compared, i have a very basic level question. i.e. is it possible to create further category/group on like scale for analysis purpose? Such as strongly disagree, disagree and somewhat disagree as 'disagree'?
Looking foward to your response. Thanks in advance.
Praheli
Greetings, if when talking about simpler procedures, such as using summated scales for regression analysis as the OP David L Morgan stated, how can one demonstrate construct validity? I understand that EFA is appropriate for item reduction and when working more inductively. However I have selected and modified scales from other credited studies for use in my population of interest. With a larger sample I understand CFA would be appropriate. However my sample will be 100 cases at best, hence I was wondering if there were any statistical alternatives which are less complex and require fewer cases for testing the validity of constructs.
I have asked a slightly similar question on RG some time ago to which Robert Trevethan and Hanif Abdul Rahman kindly responded. Apologies for any duplication.
For CFA, some sources recommend as few as 5 cases per parameter in your model, so under that standard it might work. You can also use versions of EFA that have the same assumptions underlying as CFA, such as Maximum Likelihood Estimation with correlated factors (oblique rotation).
David L Morgan so you think this method of EFA (ML with oblique rotation) could work in determining construct validity when one has an a theoretical expectation of which items are going to load on which factors? Thats interesting, I will read up more on than. In one of James Gaskin's online tutorials he said would never do CFA without doing an EFA first anyway.
Construct validity has more to do with how a scale is predicted to related to other variables.
If you have an initial hypothesis about how the variables should be organized into factors, then CFA is the most appropriate method.
In CFA, as it is a restricted correlation, we have to consider the factor loading as the weight of the parameter. The construct or the factor having the factor score which can be done through the process of imputation in SPSS AMOS.
Here, my question is, Can we use the factor score/construct as the independent variable for the further calculation in separate file. David L Morgan sir, please suggest something on this matter...
CFA is usually used with Structural Equation Modeling, and in that case there is no need to construct a factor score.
A scale can be built by some statements constructed by the past review of literature about a construct(example stress).These statements suppose 64 could be reduced for relevance by an EFA. A CFA is used in SEM as one of the two or more variables in the study are latent variables(stress) and not observed variables(age).In case the two or more variables under study were ratio(age,weight etc) then there is no need of CFA. A simple regression could work.A SEM is a combination of Reliability test(Chronbach) and CFA only meant to be done when one variable is observed and the other latent or both are latent.
A single-item factor indicates that a variable does not correlate highly with the other variables in the set. The real question, however, is how it correlates with the dependent variable, so if it is a strong predictor it should be retained.
The same logic applies for a two-item factor, especially if those items are not highly correlated (which would seem to be the case if the alpha is low).
David L Morgan Can we use 5 Lirkert scale (strongly disagree 1 to strongly agree 5) in our research instead of 7 likert (strongly disagree 1 to strongly agree 7). I mean in base paper, author used 7 likert while I intend to use same measure using 5 likert because other measures are based on 5 likert rather only one variable.
thank you
I am not sure why you would want to change the original format, but the key question whether you think the items will correlate strongly with 5-point scales.
David L Morgan Yes, using 5 likert will strongly correlate with each others, rather using only one variable as 7 likert.
Hello , David L Morgan Sir please tell me how to combined 5 factors as a one outcome for Likert scale data. Ex. What is the effect of climate change rate the factors from very high to very low on a scale of 1-5 for each factor-
1.Temprature rise
2. Decrease in Rainfall
3. Vegetation loss
4. Change in agriculture
I entered the data in SPSS as each factor as a separate variable but now I want the combined results as frequency table and other statics.
to find out importance of each factor.
Thank you
Here are two sites that are accessible to beginners and that each include a realistic example. The first one is for SPSS and the second includes syntax for SPSS, Stata, and R.
http://www.statisticshell.com/docs/factor.pdf
https://data.library.virginia.edu/using-and-interpreting-cronbachs-alpha/
Article Making Sense of Cronbach's Alpha
I think could be a good idea to read the book. "Health Measure Scales: a practical guide to their developmente and use" of David Streiner et al. Excellent book!!!
I have created a questionnaire and gathered descriptive data as part of an exploratory MMR. This questionnaire was developed after interviews, constructed with also secondary sources, pilot tested, revised and distributed. I am writing my methods chapter now and i am looking at validity and reliability. I described well all stages but I have not done anything related to Cronbach’s Alpha. My understanding is that you cannot do this at the pilot testing so you can check correlation after you close the questionnaire. But i do not get this. How the SPSS will know the content of each item and tell me if they correlate or not? I have 36 questions and 5 topic areas. I got 107 responses back. (had purposive sample) But if i follow instruction to test all questions the SPSS cannot perform this. I am not sure what do i need to do in relation to reliability and Cronbach’s Alpha ? Is this necessary? David L Morgan that would you suggest?
Glykeria Skamagki I think it is necessary, as you know alpha coefficient shows the internal consistency of measurement. A reliability coefficient of . 70 or higher is considered “acceptable” in most social science research situations.
This is a great thread with many hints and sources of information regarding factor analysis. Thank you very much David L Morgan , it helped me a lot!
David L Morgan Is it appropriate, to sum up a 5point Likert scale using number three as a neutral response and add 1,2 and 4,5 together? I know it is ordinal and these do not demonstrate equal distance but I would like to report the results as "positive" and "negative".
Glykeria Skamagki
I you do that, you would not be able to calculate any statistical analyses, but if you only want to report descriptive statistics on each separate items, that would be possible.
The discussion started off well and then later on it got all confusing. I wanted to find out from David L Morgan (if I may, please), if it is OK to use a single question to measure attitudes. During interviews, I asked people how much they like/dislike a certain wild carnivore. answer options: 1= strongly dislike to 5= strongly like.
Lovemore Sibanda,
There is no way to assess the reliability of a single item, so most of this discussion is not relevant to your situation. With one item measured on an ordinal scale, you will need to use non-parametric statistics.
Lovemore Sibanda, if that single item you described is a dependent variable, ordinal logistic regression is another option. What software do you use?