Well, I think there are three things discussed here in fact, hence the unclarity of « type II » regression :
1) When X and Y are interchangeable and you do not know which one to choose as the regressor and which one as the dependant. I also think, as Jochen Wilhem, that « regression » should not be employed in that case. When you want the best line for the (X, Y) cloud of points, PCA indeed gives this: the first eigenvector, associated to the first component of the variance, gives the direction of highest variability and it can be shown that this line corresponds also to the line with minimum orthogonal distance of each point --- can be seen with geometrical reasons; see for instance Wikipedia articles.
2) When Y is regressed on X and « slopes change with the individual », as states Jan Boehnke. This corresponds to a mixed effects linear model, and if I remember correctly it was called « Model II » [well, at least when all effects were random] as opposed to fixed effects model called « Model I », so it may be the origin of « type II regression » term?
3) When Y is regressed on X, no random effect but X is itself random and the error on X cannot be neglected (as is usually done in regression). I'm not at all familiar with this approach, which is called « Errors-in-variables models » according to Wikipedia
I do not think, despite X is random, that it enters the class of random _effects_ models, since model specification seems quite different.
So may be the first thing for Shishir Adhikari is to decide which case of these three is the one that corresponds to his case? The last one seems quite complicate. As reading his last comments, it looks like the first case instead. The link given by Jan also deals with that case; PCA is not mentionned, so I think I will read deeper the papers --- note that PCA on recuded variables or on native variables lead to different lines, so seems similar to the discussion on the link between scaled and unscaled major axis of the ellipsis, which is also given in a way by the PCA...
Do you mean type II _sums of square_ and type i _sums of square_? If so, they correspond to different kind of hypothesis that are tested on the coefficients. Details are a little bit complex, so I think you should first confirm it is the question.
If really « regression »... Never heard about that.
Thanks Emmanuel, I was talking about the type II regression. I searched and found a little bit that if dependent and independent variables are interchangeable, or both have uncertainties than you include error term in both variables. i.e linear regression(type i) minimize the vertical distance and fit the line, but here you(type II) should minimize the perpendicular distance. Anyway, I have come this far and trying to explore more.
OK. Never heard this appelation of type II regression for that, but I see what you mean.
In regression (« type I » for you), Y is random and assumed to depend on X that can be random or fixed. Usually, you use least-squares to find the parameters (the line equation for instance), that minimize the distance between Y observed and Y predicted from the X value. Y and X plays asymetric roles.
When there is no reason to assume Y and X play asymetric roles, for instance both are observed, random, and there is no reason to try to explain Y from X instead of the other way round, « type I » regression is not suited.
A way is to try to minimize the distance between the observed (X,Y) point and the (X,Y) theoretical curve in the plane: this is what you call the « type II » regression.
When a straight line is the model, you can obtain its equation by principal components analysis; which also corresponds to « least rectangles » approach. For other cases, I do not know, but it is numerically understandable.
I am not familiar with such problems, so I do not know how to build confidence intervals, tests... in such contexts. May be the previous keywords may help to extend your research.
I wonder what a regression analysis is for when you do not define what your predictor and what your response is? PCA is not regression; correlation is an undirected estimate of association. Would be nice if someone could help me...
This is where I found little about the subject. This is the user guide for R language (may be not the proper reference)but it mentions about the type II regression.
interchangeable in a sense that both the variables have uncertainties and you don't know weather to regress x on y or y on x. Both of these can produce different results, different slopes or neither of them may be correct.
Well, I think there are three things discussed here in fact, hence the unclarity of « type II » regression :
1) When X and Y are interchangeable and you do not know which one to choose as the regressor and which one as the dependant. I also think, as Jochen Wilhem, that « regression » should not be employed in that case. When you want the best line for the (X, Y) cloud of points, PCA indeed gives this: the first eigenvector, associated to the first component of the variance, gives the direction of highest variability and it can be shown that this line corresponds also to the line with minimum orthogonal distance of each point --- can be seen with geometrical reasons; see for instance Wikipedia articles.
2) When Y is regressed on X and « slopes change with the individual », as states Jan Boehnke. This corresponds to a mixed effects linear model, and if I remember correctly it was called « Model II » [well, at least when all effects were random] as opposed to fixed effects model called « Model I », so it may be the origin of « type II regression » term?
3) When Y is regressed on X, no random effect but X is itself random and the error on X cannot be neglected (as is usually done in regression). I'm not at all familiar with this approach, which is called « Errors-in-variables models » according to Wikipedia
I do not think, despite X is random, that it enters the class of random _effects_ models, since model specification seems quite different.
So may be the first thing for Shishir Adhikari is to decide which case of these three is the one that corresponds to his case? The last one seems quite complicate. As reading his last comments, it looks like the first case instead. The link given by Jan also deals with that case; PCA is not mentionned, so I think I will read deeper the papers --- note that PCA on recuded variables or on native variables lead to different lines, so seems similar to the discussion on the link between scaled and unscaled major axis of the ellipsis, which is also given in a way by the PCA...
Standard linear regression assumes that the predictors are fixed (fixed-X), this means X is measured without error. This can be the case when X is controlled by the researcher. For example, you can create an artifical gradient of oxygen in an experiment and thus you know precisely the different levels of O2 in your experiment given a measured response (Y). However, when doing observations or measurements in an uncontrolled environment, you cannot control X (and also not Y). Hence, there is a measurement error in both parameters. Model-II regression is now designed to deal with the cases of measurement error. In fact, this is actually very often the case that we cannot precisely measure and model-II regression should be used more often. However, as it is not taught in basic courses on statistics, few are aware of it. The vignette of the lmodel2 package gives a short intro but you find more in the Sokal and Rohlf book on biometry.
R. R. Sokal and F. J. Rohlf. Biometry: The principles and practice of statistics in biological research. W. H. Freeman, 3rd edition, 1995.
Edit: Another application is allometry - see the smatr package :
https://cran.r-project.org/web/packages/smatr/
and the paper by Warton et al. 2012 in MEE:
Warton, D.I., R.A. Duursma, D.S. Falster and S. Taskinen (2012). smatr 3 - an R package for estimation and inference about allometric lines. Methods in Ecology and Evolution. 3, 257-259.