What is logistic regression

Logistic regression is a single parameter discrete response regression model where the response is either binary (0,1), or is partitioned into a numerator (y) and denominator (m), with the denominator being the number of observtions having the same pattern of covariates and the numerator being a count of the number of observations where y==1 for each covariate pattern. The response variable is then a combined y/m. The latter parameteriation of logistic regression is typically referred to as grouped logistic regression, and was the first way that the logistic model was used.

As Thomas above indicated, the fitted or predicted value for a binary response logistic model is a probabilty; ie the probability that y==1 (some software algorithms use y==0 rather than y==1), where 1 is regarded as the binary "success" and 0 as non-success or "failure". For example, if the binary response or dependent variable y is "patient died while in hospital", then y==1 could be regarded as "yes" (they did die while in the hospital) and y==0 as "no" (patient did not die while in hospital). The predicted or fitted variable is the probability of y==1. When a logistic model is estimated using GLM or Generalized Linear Models (SAS=Prov GENMOD, Stata=glm, R=glm(), SPSS=GENLIN), the predicted value is usually referred to as mu (the Greek letter). The predicted values may be obtained by applying the GLM inverse link function to the linear predictor, which is SUMI(Beta*X) or the sum of the products of the model coefficients and predictor values. We can refer to the linear predictor as eta or XB, where B (Beta) is a cofficient and X is a vector of predictor values. Hence XB = SUM( Bo + B1X1 + B2X2 + ,,, BnXn). The inverse link is 1/(1+exp(-XB)) or exp(XB).(1+exp(XB)), therefore, mu=1/(1+exp(-XB)) for each observation in the model.

The logistic link function is the basis of understanding, and calculating, above inverse link function. It is easy to calculate: XB=ln(mu/(1-mu)). Since mu is the predicted probability of y==1, it has also been symbolized as pi or simply as p. This is standard when the model is being estimated using a full maximum likelihood algorithm. GLM uses an iteratively reweighted least squares algorithm, which is a subset of maximum likelihood for models that are members of the single parameter exponential family of distributions, such as the logit or logistic model. Anyhow, we may recognize that mu/(1-mu) or p/(1-p) is the expersssion for odds. The natural log of the odds, or ln(mu/(1-mu)) or ln(p/(1-p)) is called the log-odds, or logit -- how logistic or logit regression got its name.

Another very nice feature of logistic regression, which relates to its logit link function, is the fact that the exponentiation of a model coefficient gives the odds ratio for the predictor. The ratio is the odds of X==1 compared to X==0 for a binary predictor, or X=level of interest compared to X=reference level for categorical predictors, and X = x to X=x+1 for continuous predictors. For example, suppose that we have a binary logistic model with y as "died: and a single predictor,"gender" where 1=female and 0=male, Also suppose that the exponentiated coefficient on gender is 2.0. If gender significantly contributes to the model (eg, prob(X)>alpha-0.05),and the model is well fitted, then we may assert that the odds of a patient dying in the hospital is twice a great for females as for males.

Logistic regression models are likely the most used regression model in research after the basic normal or Gaussian model -- linear regression. I believe that all researchers should have a solid foundation in the logistic-based modeling, which extends to proportional odds models and multinomial regression, that are extensions of the base model. There are also logistic fixed, random, and mixed effects models, logistic GEE models, Bayesian logistic models, exact logistic regression, and so forth. Logistic regression is not only an important model in its own riight, but also in the way it has been extended to the development of a number of other important models.

I know that this may seem self serving, but I recommend Hilbe, Joseph M (2009), "Logistic Regression Models", Chapman & Hall/CRC, a 656 page text on the near full range of logistic-based models. I give extensive examples for all varieties of logistic models, providing full code in Stata and R for constructing and evaluating them. Users of SAS Proc GENMOD, Proc LOGISTIC, and other related procedures will be able to easily follow along where SAS supports the model. Joseph Hilbe ([email protected])

Joseph Hilbe

You may be interested in knowing about www.statprob.com, which is an online encyclopedia of statistics developed under the authorization of all the major international statistical associations. Each association has members designated to review submissions. Together they approve entries to the Encyclopedia, which is free to access. The American Statistical Association, Canadian Statistical Association, International Statistical Institute (world assoc of statisticians), Royal Statistical Society, Chinese Statistical Association, and others are some of the associations sponsoring the Encyclopeida, which is aimed at providing the statistical and research community with accurate and authoratative short articles on various statistical subjects.

The initial articles that started the StatProb encyclopedia came from the International Encyclopedia of Statistical Sciences (2010, Springer) [IESS]. Aside from logistic regression, there are a host of articles on a substantial range of statistical subjects. StatProb is only a couple of years old, but is continually expanding. and will someday have entries on every statistical subject, including biographies of past statisticians.

It is also possible to have on-line access to the IESSS, but there may be a charge invloved. I'm not sure. Anyhow, I recommend checking StatProb for information about statistical areas and topics. It's a good first resource.

Thomas P A Debray

Logistic regression (LOGR) is used for modeling a dichotomous outcome, whereas linear regression (LINR) is used for modeling a continuous outcome. Predictions of LOGR represent predicted probabilities of having the outcome (e.g. "death" or "alive"), whereas predictions in LINR represent predicted outcome values (e.g. systolic blood pressure 135). There are plenty resources available on the internet devoting more information about regression analysis. I have some R code available on my website: http://www.netstorm.be/home/lrm

Asrat Atsedeweyn Andargie

The above two contributors have briefly explained what the logistic regression is. To add some more points, on GLM in general and logistic regression in particular, please read the attached file herewith. Asrat ([email protected])

Sidharta Chatterjee

Very well explained basic concepts...thanks.

Rafiu Olayinka Akano

The main feature that distinguishes logistic regression from the usual multiple regression is the nature of the dependent variable. In the regression in question, the dependent variable is categorical and dichotomous, that is, variable that can only be specified as having values of zero(0) or one(1) in measurement.Like for instance having sex as dependent variable in a regression analysis will normally mean indicating 0 or 1 for the value depending on what the analyst have in mind.

Puchong Praekhaow

I would like the example of analysis.

Kenneth Osborn

My understanding of logistic regression is that is similar to probit analysis where the independent variable is converted into a probability of occurrence

How to learn more about SPSS and its Application?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Posthoc test lettering in JAMOVI?

How to back transform the results generated from analyses using log transformed with In(X+1) data?

Have you tried using Vizly for your data analysis? Use the link: https://vizly.fyi/?via=olatomide. How do you see it?

Is it appropriate for researcher(s) to collapse five or four rating Likert scales to three or two as the case maybe during data analysis?

How to test multivariate outlier in STATA?

Who wants opportunities for scientific cooperation?

Suggestion for PhD Research Topic/Topics in Applied Statistics?

What is the difference between OTU and ASV analysis?