Hello,
We're conducting quantitative research as a part of this semester, and the analysis method of choice is multivariate linear regression. Right now, we're checking for the fulfillment of assumptions, however, the scatter plots to check for linearity are bit off. We use number of patents as our dependent variable, but differences between some of the countries vary too much. So the scatter plot looks like this. Could you please recommend us how to go about optimizing the dataset, without removing too many observations from the data set? The most extreme outlier on the top is China, which we removed already, however that does not fully solve the problem. I was thinking of creating groups within the DV with specific intervals, and coding it 1, 2, 3, 4, etc..