Is it a regression line with a beta coefficient? I have added a line of best fit to my scatter plot for two subgroups,but a diagonal line of identity is required. I am using stata for my analysis and this is what my graph currently looks like.
It looks like you have y on the y-axis, and predicted y on the x-axis, so the estimated slope is one (or at least asymptotically so), if you were predicting all data together, and the distances of the points vertically from that (at least) approximate "identity line," you called it, would be the estimated residuals.
However ... that would be true if you only looked at one type of respondent. You could do a separate scatterplot for each gender. In each case, the regression yi(gender) = y*i(gender) + ei(gender), where y* is predicted y, would have an expected slope of 1. In all such cases the intercept should be 0, or approximately so.
The work would be in finding predicted y values, each set of which would come from a regression equation of some type.
However, it looks like you perhaps first have done a regression for the two genders together, and then identified the females and males after-the-fact ... on that prediction that used the data without distinction ... if I am guessing correctly what you have done ... and then plotted regression lines through the gender specific points afterward, to show that they differ? (I'd expect those lines to also look like they are close to going through the origin, but I'm not too clear on this.)
That's my guess anyway.
You might want to see if anything in the following may give you a better idea of what you might want to do:
Thank you James. I am going to need a while to get through that paper though.
This is not so much an assessment of a predictive model , as it is a demonstration line of observed vs attained for both genders. If I did not subgroup my data, a single regression line would perhaps represent the model. If I were to rephrase my question , what would a single line of identity contribute to my understanding of the differences in trend lines in grouped data?