In some publications, when one supposed to follow the assumptions of statistics, and when the level of measurement for the variable should be Interval or Ratio, one can find Ordinal level being used! Is this correct?
For ordinal data, consider proportional odds models.
Natural order
• Binomial family/extension of logistic regression
• Logit link; cumulative probabilities
• Assumes equal effect
• Fit using maximum likelihood
Use of cardinal scales (ratio or interval) analysis of ordinal scale data is of course not correct. The degree of harm depends on number of ranks. If few ranks, then serious misrepresentation of the rank data, analyzed as cardinal scale. If many ranks (say 30 or more) then less harm and little difference (usually) between the cardinal data and that data reduced to ranks.
Yes Dr., I noticed this in some publications, and as you know in some statistics books it is possible to use ordinal level in place of continuous. I am not sure if this is 100% is correct! Thank you for raising this issue
For ordinal data, consider proportional odds models.
Natural order
• Binomial family/extension of logistic regression
• Logit link; cumulative probabilities
• Assumes equal effect
• Fit using maximum likelihood
Use of cardinal scales (ratio or interval) analysis of ordinal scale data is of course not correct. The degree of harm depends on number of ranks. If few ranks, then serious misrepresentation of the rank data, analyzed as cardinal scale. If many ranks (say 30 or more) then less harm and little difference (usually) between the cardinal data and that data reduced to ranks.
Again i agree with the accuracy of David answer. More categories less error.. But it is true that the majority of non mathematics treat ordinal as acontinuos variables if you have at least 5 categories.
I think we have to be careful with this, Why?
Suppose that half of your sampla answer 1 and the other answer 5, then the average is 3 and nobody as said that. Then if i was you i would explore data before doing any kind of approximation, and if your Mean if near your mode, then i would be more safe in treating ordinal as continuous variables.
Usually this problem exist because after you have chosen the scale to measure your data , you need to aplly some model and the classic ones usually analyse "Means". then you have to be sure that your mean represent s the correct measure of location/central tendency. If you have several means, then you could see if your sample is heterogeneous...All this problems should be seen when you do the exploration of your data. Whatever satistical package, the analyst is the one that should understand not only statistics but has a good knowledge of the problem he/she is studying, because the interpretation of results should always be connected with reality.
The first is interpretability. Ordinal level data often uses labelled categories for constructs like agreement. While individual scale items may depart very significantly from an equal-interval scale, you can see from the attached plot that the 23 items of a scale to measure stigma (the first suitable data I had to hand) fit very well to a distribution that is a) continuous and b) normal. It makes sense to treat this as a continuous variable.
The second argument is from the simulation literature. The t-test is a powerful and very robust test (and better suited to small samples than the Wilcoxon Mann-Whitney, which actually cannot reach statistical significance with very small samples). It performs very well with 5-point ordinal items.
More detail here : http://bcss1.blogspot.ie/2015/02/myths-and-nonsense-about-t-test.html
I think assigning (arbitrary) numerical values to the levels of an ordinal variable is not increasing the interpretability. This only leads to believe that the results were interpretable. I see the problem here that the p-value can not be interpreted in a meaningful way. I might see a mean score difference of +0.3, and the p-value is 0.01. Should I consider this significant enough? Could I consider p=0.09 consider significant enough? The answer depends critically on the where this change takes place (e.g. from "agree to strongly agree" for from "strongly disagree to disagree") and how this change takes place (having rare cases where persons change from mostly disagree/strongly disagree to strongly agree may be very relevant (leading to a different interpretation of the observed effect size).
And who is telling me that the numbers used to code the scores have to be spaced equally? If I decide for a different spacing I will get different results, and possibly come to different conclusions. If the coding is (somewhat) non-arbitrary (so that the scores reperent a surrogate measure for some otherwise unaccessible cardinal quantity) then I agree that using a t-test makes sense - because then I can interpret the effect size and the p-value and then I can make an informed decision to reject the tested hypothesis.
If the t-test is used as a hypothesis test, then I would not know how to formulate the hypotheses I want to test, and I had no idea about the utility function to select reasonable error-rates. This is actually the same problem as to interpret the p-value in a significance test, only that here the problem comes before the experiment and is made more explicit.
However, I do agree that the analysis of multi-item variables as described for the stigma example is reasonable - when the individual items do measureing different aspects/features of "stigma". However, I have seen many examples (especially in the medical sciences) of multi-item scores where the different items measure completely different things, what renders any quantitative interpretation of the multi-item score impossible.
I don't know if I'm contributing "food for thought" or "adding fuel to the fire", but here a couple articles that I think are relevant. ;-)
Also, it seems to me that 5- and 7-point Likert-type items (which are quite commonly used) are not the same thing as ranks. Some observations:
With true ranks, it is possible to have no ties.
With Likert-type items, it is only possible to have no ties if the sample size is extremely small (in which case achieving statistical significance may be impossible, as noted by Ronán).
There are corrections for ties that can be used with the common rank-based tests, but they work best when there are no ties.
For these reasons, I think the use of rank-based (so-called) non-parametric tests with Likert-type outcome variables is misguided. I don't think that Mann, Whitney, Wilcoxon, Friedman et al. had Likert-type outcome variables in mind when they devised their tests.
Finally, the standard parametric tests may be robust for use with Likert-type items (as noted in the Norman article on "laws" of statistics), but I expect they are most robust when the distributions are unimodal and not too skewed (or at least with all distributions skewed in the same direction, and to about the same degree). As David said in his post, ordinal logistic regression is a nice way to analyze (a small number of) ordered categories.* And I believe that all of the major stats packages now have procedures for estimating such models.
* We could have another discussion, though, on just how important the parallel lines (or proportional odds) assumption is for such models. I have not used these types of models a lot, but it seems to me that the standard test of that assumption may become over-powered as the sample size increases. Perhaps David or someone else more experience than I with these models could comment on that.