The general linear model requires that the response variable follows the normal distribution whilst the generalized linear model is an extension of the general linear model that allows the specification of models whose response variable follows different distributions.
For example logistic regression (where the dependent variable is categorical) or poisson regression (where the dependent variable is a count variable) are both generalized linear models.
In addition, the response variable is related to the linear model through a link function. In the case of the linear model that would be the identity ( the "=" part of the equation)
For the generalized linear model different link functions can be used that would denote a different relationship between the linear model and the response variable (e.g. inverse, logit, log, etc). For example for the poisson regression, the link function is the "log". (You can think of the link function as a transformation of the response variable).
The general linear model requires that the response variable follows the normal distribution whilst the generalized linear model is an extension of the general linear model that allows the specification of models whose response variable follows different distributions.
For example logistic regression (where the dependent variable is categorical) or poisson regression (where the dependent variable is a count variable) are both generalized linear models.
In addition, the response variable is related to the linear model through a link function. In the case of the linear model that would be the identity ( the "=" part of the equation)
For the generalized linear model different link functions can be used that would denote a different relationship between the linear model and the response variable (e.g. inverse, logit, log, etc). For example for the poisson regression, the link function is the "log". (You can think of the link function as a transformation of the response variable).
I suggest a small modification to what George said. I.e., the general linear model assumes that the ~errors~ are normally distributed, or equivalently that the response variable is normally distributed ~conditional~ on the linear combination of explanatory variables.
If you look at textbooks or articles on the generalized linear model, the authors will almost certainly talk about the distinction in terms of the link function and error distribution. E.g., OLS linear regression is a generalized linear model with an identity link function and normally distributed errors. Binary logistic regression, on the other hand, is a generalized linear model with a logit link function and a binomial error distribution (because the outcome variable has only two possible values).
@Bruce: Indeed, it is fundamental to point out that it is *errors* (i.e., the difference between the actual score and the "true" score) that are normally distributed *conditional* on the levels of the explanatory variable (aka the independent variable), or, more generally, as you put it, conditional on the linear combination of the levels of the explanatory variables.
@Lulu: Sorry for the self-promotion that follows, but I have found that this condition of the general linear model is often not understood correctly, so in case of doubt please read my contribution to this thread:
Besides the aforementioned aspects of the link function and the distribution of residuals, there is a third property of GLzM: the variance structure establishes the association between predicted value and variance.
Model Link function Residuals Variance structure
Gaussian identity normal constant
Poisson log Poisson var = mu
Logistic logit binomial p(1-p)
The link function establishes linearity in that it maps the predicted values to to the interval [-Inf,Inf]. It is not the response that is transformed, but parameter space.
an easy way to think about GLMs is as models that generalize the error term distribution to a family of distributions, called exponential family. it includes Poisson, binomial, etc, and the normal distribution as well. you need a link function (g) to make the parameters linear in another scale, so "g" is a function of the mean. for example, log(mean of Poisson variable)=Xbeta; logit(mean of Bernoulli variable)=Xbeta. these models still assume that observations are independent
A generalized linear model (GLM), there is a g function but no f functions. So, the model is of the form:
g(E(Y)) = β0 + f1(x1) + f2(x2) +…+ fn(xn)
A general linear model (also called GLM, hence create confusion), there is no g function and f functions are scalar multiplication by numbers. So, the model is of the form:
Y = β0 + β0 x1 + β0 x2 +…+ β0 xn
Generalized linear model Vs general linear models:
For general linear models the distribution of residuals is assumed to be Gaussian. If it is not the case, it turns out that the relationship between Y and the model parameters is no longer linear. But if the distribution of residuals is one from the exponential family such as binomial, Poisson, negative binomial, or gamma distributions, there exists some functions of mean of Y, which has linear relationship with model parameters. This function is called link function.
For example, a binomial residual can use a logit or a probit link function. A Poisson residual uses a log link function.
The basic difference between generalized linear model & general linear model can be summarized as follows:
Insert table
Examples:
Multiple linear regressions (general linear model):
Example_1:
House price = β0 + β1 * number of rooms + β2 * size of house + β3 * covered parking available (yes/ no) + … + βn * average income in the respective neighborhood according to census data + white noise
Logistic Regression (Generalized linear model):
Example_1:
Log (probability the Donald Trump will win US presidential election/ He will lose)
= β0 + β1 * amount of money spent on his campaign+ β2 * amount of time spent campaigning negatively+ β3 * his popularity index + … + βn * other factors + white noise
Here, Log (probability the Donald Trump will win US presidential election/ He will lose) is called logit, which can be interpreted as log(odd ratio). Interpret regression coefficients is different from that of in case of linear regression.
Example_2:
Log (student will get admission in graduate school/ He/she will not get admission)
= β0 + β1 * Graduate Record Exam scores + β2 * grade point average+ β3 * prestige of the undergraduate institution+ … + βn * other factors + white noise
Generalized additive model:
Log (customer will make a purchase on festive sale/ He or she will not make a purchase)
= β0 + f1(number of accounts) + f2(active account types) + f3(credit limits) + … + βn * (age of the individual) + white noise
If I am doing an ordinal logistic regression, is there a difference between going through the regression -> ordinal vs. generalized linear model, and then specifying ordinal logistic? I find that it is much easier to get the output I need when I go through the generalized linear model option, but not sure if it gives me different results compared to what I would get if I went through the regression option.
The generalized linear model allows the specification of models whose response variable follows didn't follow normal distribution. It belongs to exponential family that includes normal, poisson, binomial, ordinal as well as multinomial using link functions.
What about the APA notation of the generalized mixed linear model. Is it possible to calculate the F-statistic? I learned that Matlab does not provide an easy way to get the F-statistic. This might be for a good reason, but I don't understand the reason behind it and I don't what the best alternative notation is instead.
No, F-statistics only apply for Gaussian GzLM. However, does not mean you can't test hypothesis. There are a few tests that return p-values (if you really, really need that), but using information criteria (AIC) is preferred and more flexible.
I found description in this site easy to follow http://www.statsoft.com/textbook/generalized-linear-models
There are many relationships that cannot adequately be summarized by a simple linear equation (GLM), for two major reasons:
Distribution of dependent variable and Link function
Computational Approach
To summarize the basic ideas, the generalized linear model differs from the general linear model (of which, for example, multiple regression is a special case) in two major respects: First, the distribution of the dependent or response variable can be (explicitly) non-normal, and does not have to be continuous, i.e., it can be binomial, multinomial, or ordinal multinomial (i.e., contain information on ranks only); second, the dependent variable values are predicted from a linear combination of predictor variables, which are "connected" to the dependent variable via a link function.