Will the results of an ordinal logit model be different from OLS regression, with discrete dependent variables?

Scott Everett Robinson Popular answer

This is the subject of a great deal of debate within applied econometrics. The previous posts are correct that using the OLS model when the dependent variable is ordered will result in violations of the assumptions of OLS. One can not assume, then, that the OLS estimator is giving you the best (BLU) estimates.

However, there are many within econometrics who feel that any differences tend to be trivial. They argue that the practical effect of violating these assumptions is minor and that the simplicity of interpreting an OLS outweighs the technical correctness of an ordered logit or probit model. You can find an argument like this in the book Mostly Harmless Econometrics.

In the many, many ordered and OLS models I have run on ordered dependent variables, I find that the utility of each model depends on the underlying distribution of the data. If there are more than 4-5 categories and the distribution looks quasi-normal, the OLS model gives me more or less the same results as the ordered models (same significance levels, same predicted outcomes). If there are a small number of outcome categories or the distribution of the outcomes is skewed (or polarized), the ordered models did much better. I generally run both to check for consistency. I "believe" the ordered models more -- generally speaking. I sometimes present the OLS models if the results are similar and the audience needs the simplicity.

PS In STATA, ordered models just require the ologit/oprobit command instead of regress. In R, ordered logit is a little harder to find than usual because statisticians are less fond of the approach than econometricians. You can find the "polr" command with some google-ing, though. It is important to note that the ordered models are NOT part of the generalized linear model, so it is not part of the "glm" command in R. There is a nice set of tutorials at the UCLA statistics website for each of these programs(http://www.ats.ucla.edu/stat/dae/).

Patrick S Malone

Yes, doing an OLS regression would be less appropriate because of the violation of the assumption of independent, identically distributed errors. Specifically, observations with combinations of x-values that results in y-hats that fall between (or outside) possible levels of y would have different error distributions than y-hats that fell at possible levels of y.

That's even assuming that the discrete values of Y fall on an interval measurement scale.

Cumulative logistic regression is the most common approach to this problem (at least in my field) and should be easily implemented in any software that has logistic regression capacities.

ETA: As usual with this question, the extent to which the results will differ will depend on the extent to which the assumptions of OLS are badly violated.

Scott Everett Robinson

Francois E Steffens

I do not understand what your response variable Y is; is it categorical (e.g. 1=single, 2=married, 3=divorced, 4=widowed, 5=cohabiting) then OLS will be totally inappropriate. Multinomial logistic regression will predict the probabilities of a case being in each category. If Y is categorical but ordinal (1=very bad, 2=bad, 3=so-so, 4=good, 5=excellent) then ordinal logistic regression will do the same. The point is that a probability must be between 0 and 1, which is what the logit function does, while a straight line can not be constrained to stay between 0 and 1.

A totally different scenario holds if your response Y represents counts. In that case you may have to fit a generalized linear model where you have to choose a discrete distribution (e.g. Poisson or Binomial) for Y.

The techniques mentioned above are all available in SPSS.

What's the best algorithm to check if a number is prime (primality test)?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

What is the difference between mathematical R^4 space and physical 4D unit space?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?