What is the best type of survey questions that can be later used for linear regression analysis?

Fatma Yardibi , you contradict yourself. Likert data, even when converted to numerical values, is discrete and boundend. The normal distribution is continuous and unbounded. So a variable based on numerical representation of Liekert data can not be normal distributed and therefore is violating the regression assumption.

It's a different question if the violation is relevant. But there is no general answer one could give (to my opinion, it is inadmissible to do so, but many people do so anyway). An important aspect is if the resulting numeric scale is reasonably additive, that is: will a treatment or trigger change the value by the same amount, independent of the actual value? - This must be checked and the scale has to be validated for additivity. If this is given, then these numbers may be used in a regression analysis. If the additivity is not validated or even implausible, then any quantitative analysis is likely to produce highly misleading results.

Fatma Yardibi

I sincerely concur with Jochen Wilhelm. But the sum of scores for each factor or the average score could be used to generate a scale on which multiple linear regression can easily be conducted. It has been a common practice for social sciences.

I don't know the hypothesis, but I answered the question according to common usage. My personal opinion is to do categorical data analysis like Generalized Linear Models.

Jochen Wilhelm

Fatma Yardibi , thank you for your comments. Of course you are correct saying that "the sum of scores for each factor or the average score could be used to generate a scale on which multiple linear regression can easily be conducted. It has been a common practice for social sciences." [emphases mine].

Summing and averaging requires numerical data. The original Likert data is not numerical. So there must be made some -arbitrary!- choice which number to assign to which category. This is why the scores, averaged or not, are arbitrary, and a difference in scores by x units may not correspond to a well-defined effect. This scale may not be additive (and transfromed scales like log or logit may neither be additive). Yes, as soon as the categories are mapped (somehow!) to numerical values, one can technically do anything that is possible with numbers. But not everything is meaningful or sensible. To be sensible, ome must demonstrate that the constucted scale is valid (externally and internally) and that it is additive (a similar external change leads to a similar change in the score, independent of the actual score value). Otherwise the whole analysis is simply not interpretable.

I have the impression that that most (if not almost all) such scales used or published in social science are not checked for additivity (and often, it seems to me, not even the validity is checked). So you are right: this "has been a common practice for social sciences" - but is is also a bad practice!

Only if the scale is constructed over many items, one may hope that the average also averages out non-additivity. But this is only a vague hope - until one demonstrates it.

The problem (with non-additivity) may not be relevant at all if the scores are all "far away" from the boundaries. A possible recue for the common bad practice. However, most interesting things happen at the extremes!

Fatma Yardibi

Jochen Wilhelm , I absolutely agree with you. I wrote that it could be done because it is used in practice. However, my opinion is to make a categorical data analysis. Thanks for your comment. Best regards.

Tanvir Ahammed

I understand one of your variable is in likert scale. But which one? If your dependent variable is in likert scale, you can not use linear regression. You can use ordinal logistic regression.

If you want to use linear regression you can tell the participant to rate the experience in a scale of 1 to 10 or 1 to 100. By doing this you will get a continuous dependent variable.

In short, if your dependent variable is continuous, use linear regression.

If your dependent variable is categorical (not ordered, i.e., yes/no response), use logistic regression.

if your dependent variable is categorical and you can order them (i.e.,14-16 years old,16-25 years old, 25+ years old), use ordinal logistic regression.

James R Knaub

Mariia -

You asked "What is the best type of survey questions that can be later used for linear regression analysis?" The answer is Questions which call for continuous data responses. A continuous variable, such as kilowatt-hours of electricity, or revenue, or crop production volume could be the subject of a survey question for which linear regression might apply. Official statistics produced by Government statistical agencies may collect such data. One may use linear or nonlinear regression to relate such data. For official statistics, one may have an occasional census survey and more frequent sample surveys where the data not sampled are 'predicted' using data from the previous census for the same data elements as predictor data in linear regression. This can be very helpful. (See this invited presentation for mathematical statisticians at the US Energy Information Administration:

https://www.researchgate.net/publication/319914742_Quasi-Cutoff_Sampling_and_the_Classical_Ratio_Estimator_-_Application_to_Establishment_Surveys_for_Official_Statistics_at_the_US_Energy_Information_Administration_-_Historical_Development, using prediction.)

Likert data is not as informative. If you used it for categories for independent variables, and had a way to associate a continuous dependent variable to this, then even then you might want to use ANOVA. (See https://www.statsimprove.com/en/what-is-the-difference-between-anova-and-regression-and-which-one-to-choose/.) But the way your question is worded, there seems no way to justify likert data for use with linear regression, as you may have gathered from the previous discussion.

Cheers - Jim

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Is there an alternative to a multinomial regression which allows the DV to be non mutually exclusive?