In ecology linear regression is often used. However, the results of these regressions are used to build classification metrices (don't ask me why) and often the regression model itself is not used.

Consider a linear model y~x, where y indicates ecosystem quality expressed as a continues value between 0 and 1 and x a continues factor affecting ecosystem quality (e.g total phosphorus). If the linear y~x model is created the continues value y is divided in five classes (yc1, ..., yc5) with equally widths (0.8). These classes are used by management to indicate the ecological quality states bad, poor, moderate, good and high. Understandably, we do not have a linear regression anymore and now deal with a classification issue. While the R-squared can appear high, a high number of classes determines the overlap of these. A high number of classes with large overlap decreases the accuracy and probability of misclassification of these states. I understand that the classification accuracy is depended of the model and I known why this happens. But, a simple equation might can highlight this very general issue. Is it possible from the R-squared published in literature to approximate the overlap or accuracy for any possible number of classes?

More Wim Kaijser's questions See All
Similar questions and discussions