Cronbach's alpha is the most misused and misunderstood test. Some practitioners do not seem to differentiate the difference between RELIABILITY and CONSISTENCY. reliability refers to the quality of the instrument (survey) to produce trustworthy data. Consistency refers to the predictability of the data. The Cronbach's alpha speaks to the consistency of the response in the survey, i.e. a measurement of consistency of a particular tagged question. For example, a survey may contain 50 questions and a researcher wants to test the consistency of question #10 and the sample size is 200 surveys returned. Here n = 200 and the answer score for question #10 is tabulated for Cronbach's alpha testing. The tests uses a scale between 0 - 1.00, hence a ratio test. Assume that the answer in #10 among 200 samples scores high (say 0.80) on the Cronbach's alpha test, what does it mean? It means that the answer in #10 is 0.80 consistent in a 1.00 scale. Does it tell anything more? No, it does not. In order to find a relationship, for instance, at least two variables are required. Nevertheless, practitioners mistake it for a test of reliability.
As for the issue of reliability, Cronbach's alpha does not help. In fact, it has nothing to do with reliability. Take for example, a researcher uses a Likert scale, i.e. 1 = lowest , ..., 5 = highest. How can the issue of reliability be addressed? We must not ask "whether the SURVEY is reliable?' we must ask "whether the INDIVIDUAL QUESTION is reliable?" In this case, if a conventional 95% confidence interval is used, a Likert scale of 1 - 5 fails because it can achieve only 80%, i.e. expected error distribution is 0.20: E = ([n - n(1 - df(a)]/n), where n = number of answer choice in the question and df = n -1, and a = 0.05 or precision level. reliability must come from instrument calibration.
Cronbach's alpha is used frequently as a measure in constructing a summary scale for Likert type questions or opinons (strongly agree - strongly disagree). The Cronbach's alpha gives a measure of the internal consistency or reliabilty of a scale, but one needs to do more than simply test reliability of a series of questions. I have seen researchers report high Cronbach alphas for scales with 16 or more question items. As one increases the numbers of items in a scale, it is more likely that the alpha will be high, but score may be meaningless because it does not represent an underlying construct or worse - it represents muliple underlying constructs. Prior to doing scale construction, make sure you check the individual data items to see if normally distributed. I often run principal components analysis on a series of questions that are supposed to represent a theme to see if indeed they do - or if there are multiple constructs. Then I follow up with reliability analysis of the pertinent items to develop the most parsimonious scales. The Cronbach alpha assessment should be one of the last steps in developing a scale from a series of opinion or Likert scales, but not the first.
Note that different samples will exhibit varying responses. A validated scale with a high alpha in one sample may not work the same in a different application. Important to always check reliability in your sample if using a validated instrument as well. Hope this helps...
I agree with Donna. In fact, the alpha value means very little unless there is a real relationship among the items. It's pretty much similar to the R-squared in that sense. The R-squared will normally increase with every additional variable but one will not mechanically take that to mean that those additional variables contribute significantly to the relationships being assessed.
Cronbach's alpha is the most misused and misunderstood test. Some practitioners do not seem to differentiate the difference between RELIABILITY and CONSISTENCY. reliability refers to the quality of the instrument (survey) to produce trustworthy data. Consistency refers to the predictability of the data. The Cronbach's alpha speaks to the consistency of the response in the survey, i.e. a measurement of consistency of a particular tagged question. For example, a survey may contain 50 questions and a researcher wants to test the consistency of question #10 and the sample size is 200 surveys returned. Here n = 200 and the answer score for question #10 is tabulated for Cronbach's alpha testing. The tests uses a scale between 0 - 1.00, hence a ratio test. Assume that the answer in #10 among 200 samples scores high (say 0.80) on the Cronbach's alpha test, what does it mean? It means that the answer in #10 is 0.80 consistent in a 1.00 scale. Does it tell anything more? No, it does not. In order to find a relationship, for instance, at least two variables are required. Nevertheless, practitioners mistake it for a test of reliability.
As for the issue of reliability, Cronbach's alpha does not help. In fact, it has nothing to do with reliability. Take for example, a researcher uses a Likert scale, i.e. 1 = lowest , ..., 5 = highest. How can the issue of reliability be addressed? We must not ask "whether the SURVEY is reliable?' we must ask "whether the INDIVIDUAL QUESTION is reliable?" In this case, if a conventional 95% confidence interval is used, a Likert scale of 1 - 5 fails because it can achieve only 80%, i.e. expected error distribution is 0.20: E = ([n - n(1 - df(a)]/n), where n = number of answer choice in the question and df = n -1, and a = 0.05 or precision level. reliability must come from instrument calibration.
very true, I always report the results of anther two internal consistency reliability indicators, namely inter-item correlation and item-to-total correlation in order to measures how well a set of items measures a single unidirectional latent construct.
IRT classes and Principle components analysis will help ...as well as factor analysis . Factor analysis will test whether each question is related to the construct it is supposed to be measuring.
Cronbach's Alpha measures only the reliability of scale of measurement of responses of the cases, in a Likert Scale. But it hardly measures the reliability of the respondents' opinion leading to the latent construct. In that case, useful measurements are factor loadings, average variance extracted and construct reliability. Factor loadings are standardised regression weights, where from you can calculate item reliability to arrive at average variance extracted (should be at least greater than .05) by summing the squares of the factor loadings and also can calculate construct reliability with the help of item reliability. This gives you the convergence validity, i.e., whatever opinions the respondents give that converge to form the latent construct. This can be confirmed not through Exploratory Factor Analysis but through Confirmatory Factor Analysis.
Usually, for the sake of the integrity of the research, it is recommended to run a series of tests including principle components analysis, factor analysis, Cronbach's alpha, and other indicators. Having an integrated set of tests adds more to the value of the research in question.
Thank you everybody for the detailed discussion. I stand educated. There is a bit more that I need to know actually!
Using the trail by Paul Louangrath above; what if the 5 point likert based QUESTION has two extremes that are equally likely to be chosen and the answer depends upon the 'opinion' of the respondent. For example - Q: In a balanced scorecard environment, should 6 Sigma be used as a stand alone project outside balanced scorecard? 1=Str. Agree ..... 5 =Str Disagree.
Now, depending upon the context of the respondent's organization, he is likely to pick any of the extremes as answers; and will still be contextually right. But my stats will be pretty inconsistent since different organizations have different balanced scorecard implementation contexts.
So, the question: How good is my framed question?
Being a balanced scorecard practitioner, I know its an important question. Being a researcher, Cronbach Alpha tells me its not a good question.
Actually I would like that you use a statement instead of a question. Second, you may prepare statements comparing the benefits or other features instead of being so direct and therefore people will respond differently without selecting exactly either of the two extremes as you mentioned. Asking a question characterized being direct may cause leading the respondent to the easiest response.
Therefore, thinking about the context of the question is very important before writing it to avoid respondents' biases.
Cornbach deals with split half reliability. If a scale is consistent, then the score from the first half of the items should be "consistent" (ie highly correlated) with the second half (as they should be measuring the same concept). There are, however, many ways to split the items into two halves, and each would yield a different correlation. Alpha is the average of the correlations or all possible "split halves" (turns out to be a relatively simple formula mathematically).
Albeit Cronbach‟s Alpha is widely used as an estimator for reliability tests, it has been criticized for its lower bound value which underestimates the true reliability (Peterson, R.A. and Y. Kim, 2013). Composite Reliability can be used as an alternative as its composite reliability value is slightly higher than Cronbach‟s Alpha whereby the difference is relatively inconsequential (Peterson, R.A. and Y. Kim, 2013).