Logistic regression (LOGR) is used for modeling a dichotomous outcome, whereas linear regression (LINR) is used for modeling a continuous outcome. Predictions of LOGR represent predicted probabilities of having the outcome (e.g. "death" or "alive"), whereas predictions in LINR represent predicted outcome values (e.g. systolic blood pressure 135). There are plenty resources available on the internet devoting more information about regression analysis. I have some R code available on my website: http://www.netstorm.be/home/lrm
Logistic regression is a single parameter discrete response regression model where the response is either binary (0,1), or is partitioned into a numerator (y) and denominator (m), with the denominator being the number of observtions having the same pattern of covariates and the numerator being a count of the number of observations where y==1 for each covariate pattern. The response variable is then a combined y/m. The latter parameteriation of logistic regression is typically referred to as grouped logistic regression, and was the first way that the logistic model was used.
As Thomas above indicated, the fitted or predicted value for a binary response logistic model is a probabilty; ie the probability that y==1 (some software algorithms use y==0 rather than y==1), where 1 is regarded as the binary "success" and 0 as non-success or "failure". For example, if the binary response or dependent variable y is "patient died while in hospital", then y==1 could be regarded as "yes" (they did die while in the hospital) and y==0 as "no" (patient did not die while in hospital). The predicted or fitted variable is the probability of y==1. When a logistic model is estimated using GLM or Generalized Linear Models (SAS=Prov GENMOD, Stata=glm, R=glm(), SPSS=GENLIN), the predicted value is usually referred to as mu (the Greek letter). The predicted values may be obtained by applying the GLM inverse link function to the linear predictor, which is SUMI(Beta*X) or the sum of the products of the model coefficients and predictor values. We can refer to the linear predictor as eta or XB, where B (Beta) is a cofficient and X is a vector of predictor values. Hence XB = SUM( Bo + B1X1 + B2X2 + ,,, BnXn). The inverse link is 1/(1+exp(-XB)) or exp(XB).(1+exp(XB)), therefore, mu=1/(1+exp(-XB)) for each observation in the model.
The logistic link function is the basis of understanding, and calculating, above inverse link function. It is easy to calculate: XB=ln(mu/(1-mu)). Since mu is the predicted probability of y==1, it has also been symbolized as pi or simply as p. This is standard when the model is being estimated using a full maximum likelihood algorithm. GLM uses an iteratively reweighted least squares algorithm, which is a subset of maximum likelihood for models that are members of the single parameter exponential family of distributions, such as the logit or logistic model. Anyhow, we may recognize that mu/(1-mu) or p/(1-p) is the expersssion for odds. The natural log of the odds, or ln(mu/(1-mu)) or ln(p/(1-p)) is called the log-odds, or logit -- how logistic or logit regression got its name.
Another very nice feature of logistic regression, which relates to its logit link function, is the fact that the exponentiation of a model coefficient gives the odds ratio for the predictor. The ratio is the odds of X==1 compared to X==0 for a binary predictor, or X=level of interest compared to X=reference level for categorical predictors, and X = x to X=x+1 for continuous predictors. For example, suppose that we have a binary logistic model with y as "died: and a single predictor,"gender" where 1=female and 0=male, Also suppose that the exponentiated coefficient on gender is 2.0. If gender significantly contributes to the model (eg, prob(X)>alpha-0.05),and the model is well fitted, then we may assert that the odds of a patient dying in the hospital is twice a great for females as for males.
Logistic regression models are likely the most used regression model in research after the basic normal or Gaussian model -- linear regression. I believe that all researchers should have a solid foundation in the logistic-based modeling, which extends to proportional odds models and multinomial regression, that are extensions of the base model. There are also logistic fixed, random, and mixed effects models, logistic GEE models, Bayesian logistic models, exact logistic regression, and so forth. Logistic regression is not only an important model in its own riight, but also in the way it has been extended to the development of a number of other important models.
I know that this may seem self serving, but I recommend Hilbe, Joseph M (2009), "Logistic Regression Models", Chapman & Hall/CRC, a 656 page text on the near full range of logistic-based models. I give extensive examples for all varieties of logistic models, providing full code in Stata and R for constructing and evaluating them. Users of SAS Proc GENMOD, Proc LOGISTIC, and other related procedures will be able to easily follow along where SAS supports the model. Joseph Hilbe ([email protected])
The above two contributors have briefly explained what the logistic regression is. To add some more points, on GLM in general and logistic regression in particular, please read the attached file herewith. Asrat ([email protected])
The main feature that distinguishes logistic regression from the usual multiple regression is the nature of the dependent variable. In the regression in question, the dependent variable is categorical and dichotomous, that is, variable that can only be specified as having values of zero(0) or one(1) in measurement.Like for instance having sex as dependent variable in a regression analysis will normally mean indicating 0 or 1 for the value depending on what the analyst have in mind.
My understanding of logistic regression is that is similar to probit analysis where the independent variable is converted into a probability of occurrence