i would like to know about logistic regression ,i have to use it in research data analysis
I give up!
Logit models are not just for binary response ; they are for a situation where the numerator is some subset of the denominator. A special case of this is where the denominator is the value 1, and the numerator is either a 1 or 0 - the binary outcome is just a special case of the general model.
In logistic regression your dependent variable is ordinal or nominal (categorical) where in linear regression it is scale (numerical). Thus logistic regression can be ordinal logistic or multinomial logistic. Lastly if the dependent has only two categories the model is called binary logistic. I use such regression analysis in my article Article Public Perceptions and Willingness to Pay for Renewable Ener...
Hello Babitha,
Stamitas Ntanos is correct in his explanation. To that, I would add the following:
1. Logistic regression (LR) does not require assumptions of normality.
2. LR can handle discrete or continuous independent variables.
3. Multiple linear regression (MLR) quantifies model accuracy via the standard error or variance error of estimate (or, for many people, the magnitude of the R-squared), whereas LR quantifies model accuracy via the (-2)LLR (lower is better) and the classification accuracy. While "pseudo-R-squared" statistics are often reported in LR software, it's not variance accounted for.
4. MLR solution is computed via a one-step equation (unless IVs have some linear dependency). LR uses an iterative, maximum-likelihood solution process to derive estimates of regression coefficients.
5. LR requires that you shift your thinking about a regression coefficient from "if X changes one unit, then Y is expected to change by B units, holding all other IVs constant," to "if X changes by one unit, then the log-odds of target category of Y being observed increases by B units (or, the odds of target category of Y increase by exp(B) units)."
Good luck with your work!
You may find these papers helpful:
https://statisticsbyjim.com/regression/choosing-regression-analysis/
https://maitra.public.iastate.edu/stat501/lectures/MultivariateRegression.pdf
Please Read carefully there is more info here than you asked about. Best, David Booth
Hi Babitha Ek,
All the responses here have been great. I particularly find the Jim Frost's article attached by David Eugene Booth as a good explanation. This explanation from stakeoverflow is also a good summary: https://stackoverflow.com/questions/12146914/what-is-the-difference-between-linear-regression-and-logistic-regression
Good luck!
Nadia H. Al-Noor
I am sorry but all your first three points are wrong, or at least problematic.
The logistic model is not not only used for binary outcomes as it is an important model for analyzing proportions (with a potentially varying denominator) that is closed ratios bounded by 0 and 1.
And the explanatory variables can be correlated, but not too much - same assumptions as OLS continuous regression. That is is why you have a logit model with several predictors.
And there is an underlying linear assumption but now on the logit scale. This assumption can be evaluated relaxed in Generalized Additive Models. Article “Moving Out of the Linear Rut: The Possibilities of Generali...
Article Generalized Additive Models, Graphical Diagnostics, and Logi...
The web source you point to is in error, or confused on all these three points. Much better to consult a good peer reviewed book from good publishers. Or a high end statistics training site from a reputable university, or a specialist software supplier for statistical analysis.
Hi, I would like to take this topic to continue the discussion about linear regression
In which cases I can use linear regression in a non-normal distribution?
I have been reading some validation studies which used linear regression even with a non-normal distribution. However, they did not justify why. Please, could someone explain it to me?
If the underlying probability of yes ( or indeed no) outcome lies between 0.2 and 0.8 , then a linear model would be fine; within this range the relationship between the logit and probability is essentially linear. My source is Sir David Cox when replying to questions at Nuffield College in the early 1990s.
https://www.jstor.org/stable/2983890?seq=1#page_scan_tab_contents
This post also represents an informed commentary on the issues involved
https://statisticalhorizons.com/linear-vs-logistic
Linear and Logistic regression are the most basic form of regression which is commonly used. The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear. therefore it depends on your dependent variables you will going to use.
Israel Bekele Molla As per my earlier answer, logistic regression can also be used when the response is continuous, and is composed of a closed ratio where the numerator is some subset of the denominator. It is not just for binary outcomes. Indeed the binary is just a special case where the denominator is a 1 (the so-called number of trials) and the outcome can only be a 1 or 0.
Linear regression is used when the dependent variable is continuous and nature of the regression line is linear. In contrast, Logistic regression is used when the dependent variable is binary in nature.
Why do researchgate responders (e.g., Israel Bekele Molla and Hom Nath Chalise ) continue to give responses that Kelvyn Jones has already corrected? Maybe these responders (not Kelvyn) can explain.
Dear Babitha Ek
Linear regression needs a linear relationship between the dependent and independent variables. While logistic regression does not need a linear relationship between the dependent and independent variables.
Linear regression aims at finding the best-fitting straight line which is also called a regression line. Linear regression requires the dependent variable to be continuous i.e. numeric values (no categories or groups).While Binary logistic regression requires the dependent variable to be binary - two categories only (0/1). Multi-nominal or ordinary logistic regression can have dependent variable with more than two categories.
Linear regression is based on least square estimation which says regression coefficients should be chosen in such a way that it minimizes the sum of the squared distances of each observed response to its fitted value. While logistic regression is based on Maximum Likelihood Estimation which says coefficients should be chosen in such a way that it maximizes the Probability of Y given X (likelihood). With ML, the computer uses different "iterations" in which it tries different solutions until it gets the maximum likelihood estimates.
*Sample Size : Linear regression requires 5 cases per independent variable in the analysis.While logistic regression needs at least 10 events per independent variable
No apologies for being pedantic as it is important that researchers understand that the binomial logit model could be useful for analysis data such as percentages which are continuous ( as could Clog and probit models)
Neil Wrigley (1973) The Use of Percentages in Geographical Research Area Vol. 5, No. 3, pp. 183-186
https://www.jstor.org/stable/pdf/20000750.pdf?seq=1#page_scan_tab_contents
Moreover in the multinomial case, the same issue applies as you could have a set of proportions that sum to to 1 forming a closed ratio which usually produces an inbuilt negative correlation which has to be taking into account in the modelling.
A key feature of the logit (in both its Bernoulli and binomial form) is that the logit transformation is applied to the predicted response from previous iteration so that it is not the observed data that is transformed. To demonstrate this, take the 1 and 0 (out of 1) of the observed binary outcome and perform the logit transformation you will see that you get plus and minus infinity and you can go no further!
See my (longish) answer to this question:
https://www.researchgate.net/post/What_is_the_difference_between_binary_logistic_regression_and_binomial_logistic_regression
I am also deeply skeptical about rules such as 10 observations per variable as this does not take account of the possible collinearity between the predictors, nor does it take account of the potentially differing size of denominators . You need to do a proper power analysis.
See my answer to this question:
https://www.researchgate.net/post/When_can_a_researcher_relax_the_rule_of_ten_events_per_variable_in_Logistic_Regression
Finally on probits and ClogLog see
https://www.researchgate.net/post/Probit_and_logit_model2
https://techdifferences.com/difference-between-linear-and-logistic-regression.html
Just open the link you can get your answer.
Kaushik Kumar Panigrahi here we go again, this website is incorrect as I pointed out in earlier posts.
Logistic regression is used for assessing the effects of explanatory factors on the relative risk of outcomes. The logistic transformation can be interpreted as the logarithm of the odds of success vs failure.
You can read the book "Hosmer, D. & Lemeshow, S. Applied Logistic Regression. New York: John Wiley & Sons, Inc."
The following tutorials from UCLA will also help you
https://stats.idre.ucla.edu/stata/dae/logistic-regression/
To put it very simply:
In logistic regression analysis the dependent variable (that one that has to be explained) is binary and coded = 1 or 0,
in linear regression analysis the dependent variable is a continuous variable
It may be a simple answer , but it is wrong! Look at earlier postings.
inear and Logistic regression are the most basic form of regression which are commonly used. The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear.
Regression is a technique used to predict the value of a response (dependent) variables, from one or more predictor (independent) variables, where the variable are numeric. There are various forms of regression such as linear, multiple, logistic, polynomial, non-parametric, etc.
I give up!
Logit models are not just for binary response ; they are for a situation where the numerator is some subset of the denominator. A special case of this is where the denominator is the value 1, and the numerator is either a 1 or 0 - the binary outcome is just a special case of the general model.
Linear regression is used when the dependent(output/outcome) variable is continuous.
Whereas, Logistics regression is used when the dependent variable is categorical(binary).
Laxman Singh Bisht , did you read the answer right above yours by Kelvyn Jones ? Why are you (and others ) trying to mislead people? I assume you are doing this on purpose. I don't understand why many use researchgate for this. This undermines the trust people can have in researchgate responses.
Daniel Wright, I have read, but intentions are not to mislead. This information is not exhaustive. It is in the most simplest form.
I agree with the answers given by Kelvyn Jones.
Regards
Laxman Singh Bisht, I am confused how you are agreeing with him. He says that binary logistic is only a special case of logistic regression (and does above also). You (and others above) feel, presumably, that his answer is not complete or is not accurate so feel compelled to write "Whereas. Logistic regression is used when the dependent variable is categorical (binary)." How is this agreement?
For those who still doubt, have a look at mine and @Bruce Weaver's re-analysis of 1013 binary observations (Bernoulli Logit) as 19 proportions (Binomial Logit) - the same results and confidence intervals:
https://www.researchgate.net/post/How_can_I_fix_high_odds_ratio_and_confidence_interval_in_logistic_regression
Key Differences Between Linear and Logistic Regression
Raaed Fadhil , I am sorry but all three of your points are incorrect or at least partial. Please look at previous postings.
Alireza Arabameri
i am sorry but I disagree, logistic is not about classification.The main difference is that questions about logistic regression cause Kelvyn Jones to try to correct people's mis-conceptions.
To use linear regression we need a linear relationship between the dependent and independent variables. On the other hand, to use logistic regression we do not need a linear relationship between the dependent and independent variables.
Linear and Logistic regression are the most basic form of regression which are commonly used. The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear.
It may be seen in the following book
Business Statistics by examples fifth edition by Terry Sincich.
The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear.
logistic regression is for the dependent variable in which it is binary, whereas linear is for continuos type of variable.
Isam Alkhalifawi and Ismail Maakip , were you unable to read the previous comments by Kelvyn Jones ?
Another way to put it is that logistic models are a proper subset of linear statistical models such that the DV is as defined as Kelvyn Jones said. Daniel Wright has an interesting take on things. Right on Daniel Wright
The reason @Kelvin Jones defined things the way he did is because he knows that both binomial and multinomial logistic regression exist as does ordinal logistic regression, which is a bit different than the first two, because it deals with ordinal dependant variables but still fits @Kelvyn Jones definition. Second Paragraph added on 5/10/20
Logistics regression is a special case of linear regression. We employ logistics regression when the depend variable are categorical in nature while Linearregression is applied if the dependent variable is continuous in nature
Logistics regression was prepared for survival analysis where the values of response variables are only 0 or 1 depending on the survival. Thus the logistic model assures that the estimated response lies between 0 and 1. Where linear regression was prepared for the modeling of a linear relationship between dependent and independent variables.
Logistic analysis can be used for proper survival analysis but then the data structure has to be of a particular type called person period, where a 1 occurs if a person 'dies' in that period 0 if not. see https://www.bristol.ac.uk/media-library/sites/cmm/migrated/documents/discrete-time-eha-july2013-combined.pdf
As far as I know this form of event history analysis was invented by Paul Allison in the mid 1980s.
The origins of the original logit approach are much earlier and set out in in The origins and development of the logit model J.S. Cramer ∗ August 2003 ( https://pdfs.semanticscholar.org/7218/daab6499b46759f0a16d173d01d348bed906.pdf )
it has this to say " The paper describes the origins of the logistic function and its history up to the adoption of the logit in bio-assay and the beginning of its wider acceptance in statistics. Its roots spread back to the 19th century, when the function was invented to describe population growth and given its name by the Belgian mathematician Verhulst. Subsequent events have been determined decisively by the individual actions and personal histories of a few scholars: the rediscovery of the growth function is due to Pearl and Reed, the survival of the term logistic to Yule, and the introduction of the function in bio-assay (and hence in statistics in general) to Berkson. "
Cramer goes on to say , page 12 " The earliest developments [ the ascent of the logit] took place in the late 1950’s and the 1960’s in statistics and epidemiology. In statistics, the analytical advantages of the logit transformation as a means of dealing with discrete binary outcomes were soon recognized. Cox was among the first to explore (and exploit) these possibilities; he wrote a series of papers between around 1960, and followed these up with an influential textbook in 1969."
From the beginning logit/logistic modelling was not exclusively about binary data but it could be used for that purpose, and that came later than the analysis of proportions and growth curves. To re-iterate the binary is a special case of the model for proportions in which the denominator (the number of trials ) is 1.
Logistics regression is a special case of linear regression. Usually, we utilize logistic regression when the dependent variable is categorical in nature. Nevertheless, the linear regression is applied if the dependent variable is continuous in nature.
Hamid Mohsin Jadah , sorry but you are wrong, logistic should be a candidate for the analysis of continuous ,
proportions , it is not just for binary, see my many earlier posts.
One of the major difference is the nature of statistical data type. Linear regression is suitable for interval and ratio scale data types, while logit regression is suitable for nominal and ordinal scales.
It should be noted that sampling for linear regression is best at probability sampling.
Adetayo Olaniyi Adeniran , I disagree. While Stevens scale of measurement is given in many textbooks and it can be useful in structuring the choice of appropriate tests, it is more problematic for modelling, others find it misleading and out of date-
Paul F. Velleman &Leland Wilkinson Nominal, Ordinal, Interval, and Ratio Typologies are Misleading Article Nominal, Ordinal, Interval, and Ratio Typologies Are Misleading
.Thus for example, it does not include counts as a distinct type of data which often require specific models such as Log- Poisson or Log NBD.
But back to the question at hand, crucially Stevens does not distinguish between closed and open ratios.; the former can occur when the numerator is some subset of the denominator and is usually bounded by 0 and 1. Thus a proportion is a closed ratio, it is continuous but many would not use standard linear regression for this, but instead use the binomial logistic which respects the boundedness and the inbuilt heterogeneity that follows from varying denominators. So I disagree that ratio data necessarily requires linear regression.
Neil Wrigley (1973) The Use of Percentages in Geographical Research ,AreaVol. 5, No. 3, pp. 183-186
Book Percentages, Ratios and Inbuilt Relationships in Geographica...
https://www.jstor.org/stable/pdf/20000750.pdf?seq=1@Kelvyn Jones You may have your points but this is absolutely debatable. I may not totally agree with you. Linear regression enhances quatification of the extent at which variable is explained by another variable. So it is best on continuous data. It is important to note that whatever written down in text book is best know to the author, others may partially disagree. 6 to you may appear 9 to another.
The basis of whatever will determine a choice of statistical test is embedded in the type of data (data measurement). Thank you.
Adetayo Olaniyi Adeniran I agree completely with you last sentence but I think Stevens scale is problematic and gave a reference to support that.
To be clear, yes standard OLS regression is suitable for continuous data but not every type of continuous data ; proportions and survival data for example are problematic for the standard model and that is why so much effort has been put in the last 60 years to develop the generalized linear model, and move beyond standard linear regression. Thanks for taking the time to respond.
Kelvyn Jones, good to inoculate ideas with you. Thank you.
Babitha Ek,
Answering your question in a simple way:
Linear regression looks for a linear relationship between a dependent variable and one or more independent variables, so that, for given value(s) of the independent variable(s,) you can predict the value of the dependent variable.
Logistic regression relies on the logistic (or logit) model) and can be used to predict the probability (or chance) of the ocorrence of a certain event,
Concerning coronavirus, for example, multiple linear regression can be used to predict number of deaths related to number of infected persons, size of the population, age of the pacients, etc.
Logistic regression, on the other hand, can be used to predict whether a pacient with coronavirus and presenting some other characteristics shall survive (value 1 for the dependent variable) or is more likely to die (value 0). In this example, the dependent variable is binary (1 or 0).
Linear regression and logistic regression are well explained in several references you can get through internet.
A linear regression is appropriate when the model-dependent variable is of the continuous type. Logistic regression is appropriate for other situations, but mainly when the dependent variable is of the categorical type (ordinal or dichotomous, for example).
Mainly depends on what type of outcome variable you have. when you have both predictors and outcomes as continuous linear regression will be used. If your outcome variable is categorical, binary or multinomial then logistic regression models can be used.
There are several differences between logistic regression and linear regression, including:
First: The adopted variable in linear regression has a normal distribution. In logistic regression, it has a binomial distribution.
Second: To draw the regression line in linear regression, it is in the form of a straight line, but in logistic regression, the curve of the regression line is nonlinear ....
Third: The values of the adopted variable are within the integers in linear regression. In logistic regression, the values of the adopted variable are probabilities between zero or one ...
The essential difference between these two is that Logistic regression is used when the dependent variable is binary in nature. In contrast, Linear regression is used when the dependent variable is continuous and nature of the regression line is linear.
Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. The output for Linear Regression must be a continuous value, such as price, age, etc.
When outcome variable of interest and predictor variables are measurable linear regression is used. When outcome variable is caregorical either binary logistic regression or multinomial regression is used.
Dear Babitha Ek logistic regression does not require any assumptions of normality whereas Multiple linear regression (MLR) quantifies model accuracy via the standard error or variance error of estimate (or, for many people, the magnitude of the R-squared), whereas LR quantifies model accuracy via the (-2)LLR (lower is better) and the classification accuracy. While "pseudo-R-squared" statistics are often reported in LR software, it's not variance accounted for.Also MLR solution is computed via a one-step equation (unless IVs have some linear dependency). LR uses an iterative, maximum-likelihood solution process to derive estimates of regression coefficients. Please also note that Logistic regression requires that you shift your thinking about a regression coefficient from "if X changes one unit, then Y is expected to change by B units, holding all other IVs constant," to "if X changes by one unit, then the log-odds of target category of Y being observed increases by B units (or, the odds of target category of Y increase by exp(B) units)."
Linear and Logistics regression falls under the supervised learning subset of machine learning. However, whereas Linear regression may be termed a classical regression model that uses the Ordinary Least Squares approach of parameter estimation, Logistic regression falls under the classification model that uses the Minimum Likelihood Estimate for parameters. Linear regression predicts a set of continuous numerical variables based on fed in input data. Whereas the Logistic regression segments input data into labelled categories by computing the probabilities of occurrence of those output categories which can between 0 and 1.
Courage Ekoh it's maximum likelihood :)
Btw both methods belong to a class called generalized linear models
Depending on your outcome you might use a different link function
Linear regression is used to predict the continuous dependent variable using a given set of independent variables. Logistic Regression is used to predict the categorical dependent variable using a given set of independent variables. The output for Linear Regression must be a continuous value, such as price, age, etc.
Mahadi Hasan Miraz You are wrong, see several of my previous replies. The logistic model is suitable for continuous proportions and not just a binary categorical response variable. And an 'output' from logistic models can be the continuous predicted values of the underlying latent variable of 'Yes' saying in the form of Logits, ( and odds, probabilities and proportions when transformed).