There is no sensible statistical procedure to guide the selection of a model. Either you want it to describe a pattern you see, than any arbitrary curve will do. Or you really like to model, then you should justify the model on theoretical grounds or you should be very careful not to over-emphasize some patterns that might only be special in your set of data.
Longer:
A line can by drawn though a cloud of points to guide the eye, to indicate a pattern, or to etsimate interpolations. This refers to the data at hand. It is not really a "model" with any real explanatory power (only more or less variance is "explained" - what is not what I mean here). It doesn't reall matter by which algorithm such a line is drawn. On may take hand, eye, and pencil. One may also find a "best" line, best w.r.t to some arbitrary criterion and constraints. This could be a least-squares fit or maximum-likelihood fit or some spline or other scatterplot-smoothers.
A line can represent a model, so that the parameter values for that model are estimated from the obseverd data. The model itself (a staight line, a parabola, an exponential function, a logistic function, ...) should be reasonable, sensible, and useful. It should actually be based on theoretical considerations and not on the shape of the observed data. The parameters of the model should have some physical meaninig. I know that this is often not possible, so many regression models shown in the litrature are only statistical models, not really representing biologically meaningful relationships. They just describe a pattern and ought to be interpreted more in the way I wrote in the first paragraph.
So the choice of a particular model should be guided by a theoretical understanding of the relevant relations and processes. If this is not available, you can just roughly describe a pattern. Then it is important not to over-interpret each little fluctuation that might be recogizeable in the observed data, because these special features may be specific to your data and won't generalize to other data from similar experiments. How closely the model (fit) should be to the data depends on your needs for generalization. If you want to give an impression of the diversity-altitutde relationship in particularily that region that you have observerd, a higher-order polynomial or a spline or lowess-curve is fine and clearly much better than a simple straight line. In Ehsan's example above, it would follow the points showing a local maximum around 500m and a local minimum around 700m. However, the altitude may not be the only factor influencing the diversity. There might be other geographical features, soil and irradiation is different in different altitudes, and so on. So there will likely be a lot of confounders. If you knew them you could consider them in a model, actually estimating an adjusted altitude-diversity-relationship. If you don't know these confounders than you have to be careful in interpreting such local effects, since they can well be determind by these unknown confounders and not by altitutde. For instance, given Ehsan's plot above, I would strongly doubt that the decrease of diversity between 600m and 700m is a general feature that should be expected generally in such mountainous regions. So as a general model a curve following these minima and maxima might not be useful (and how would you justify it theoretically?). It would then be safer to say that the diversity slightly increases with altitude (again interpreting Ehsan's plot), and that the *average* increase (= the slope of the straight line) is not very pronounced.This might be theoretically justified as well. I'd further note that there might be local influences on the deviersity with a considerable stronger impact on the diversity than the altitude has.
If i want to explain simple, p-value says whether your coefficient of regression is equal to zero (if not you have a low p-value (less than 0.05). But R-sqr says that how much of the variation of dependent variable explained by the independent one. As instance, a R-sqr of 0.12 says that only 12% of the variation in your dependent variable is explained by your independent variable.
But be aware the intensity of the effect of your IV is your regression coefficient ( Beta). even with low R-sqr the effect intensity can be very high, although your p-value implies the intensity.
I think p-value measures how significant an independent variable is to the response variable. R square measures the proportion of the response variable's variance that can explained by your model. Generally speaking, lower p-value and higher r square is better.
Salvador, this fit plot is the same as what you've seen. I fitted a linear regression between altitude and number of Taxa in a forest and it gives me a p-value (model) of 0.0306, and an R-sqr of only 0.08. The regression coeff. is 0.007, so it's very low but significant.
so i had a fitted a curve with a cubic function. the p value was 0.001 and the r sq was .45. So I guess I can take the relation as meaningful is what i mean.
Ok, Salvador, everything is clear with R-sq less than 0.1, as Ehsan wrote, but interpretation of values about 0.4-0.6 is much more difficult. In you case, the proportion of variance explained by the function is 0.45. Is it enough to discuss the the effect of predictor variable? Big question. Imagine, you say that 45% of my salary depend on my education and use education as important predictor of my future income. Well, as for me, I should be very interested to ask what happened with the rest 55% of my money? I really don't want to miss it!! :-)
I guess, it is worth to use variable as a good predictor if it explains not less than, say, 70-80% (R-sq >0.7), then the rest could be treated as "white noise". But I doubt we can treat 55% of variance as white noise.
Salvador, in my opinion we do not use more than 2 orders function in regression modelling in statistic. Because it can give very very vast errors in prediction, especially when the R-sqr is lower than .85 to0.9. So, my advice is to use non-linear regression as much as possible, not 3 or 4 order linear regression.
There is no sensible statistical procedure to guide the selection of a model. Either you want it to describe a pattern you see, than any arbitrary curve will do. Or you really like to model, then you should justify the model on theoretical grounds or you should be very careful not to over-emphasize some patterns that might only be special in your set of data.
Longer:
A line can by drawn though a cloud of points to guide the eye, to indicate a pattern, or to etsimate interpolations. This refers to the data at hand. It is not really a "model" with any real explanatory power (only more or less variance is "explained" - what is not what I mean here). It doesn't reall matter by which algorithm such a line is drawn. On may take hand, eye, and pencil. One may also find a "best" line, best w.r.t to some arbitrary criterion and constraints. This could be a least-squares fit or maximum-likelihood fit or some spline or other scatterplot-smoothers.
A line can represent a model, so that the parameter values for that model are estimated from the obseverd data. The model itself (a staight line, a parabola, an exponential function, a logistic function, ...) should be reasonable, sensible, and useful. It should actually be based on theoretical considerations and not on the shape of the observed data. The parameters of the model should have some physical meaninig. I know that this is often not possible, so many regression models shown in the litrature are only statistical models, not really representing biologically meaningful relationships. They just describe a pattern and ought to be interpreted more in the way I wrote in the first paragraph.
So the choice of a particular model should be guided by a theoretical understanding of the relevant relations and processes. If this is not available, you can just roughly describe a pattern. Then it is important not to over-interpret each little fluctuation that might be recogizeable in the observed data, because these special features may be specific to your data and won't generalize to other data from similar experiments. How closely the model (fit) should be to the data depends on your needs for generalization. If you want to give an impression of the diversity-altitutde relationship in particularily that region that you have observerd, a higher-order polynomial or a spline or lowess-curve is fine and clearly much better than a simple straight line. In Ehsan's example above, it would follow the points showing a local maximum around 500m and a local minimum around 700m. However, the altitude may not be the only factor influencing the diversity. There might be other geographical features, soil and irradiation is different in different altitudes, and so on. So there will likely be a lot of confounders. If you knew them you could consider them in a model, actually estimating an adjusted altitude-diversity-relationship. If you don't know these confounders than you have to be careful in interpreting such local effects, since they can well be determind by these unknown confounders and not by altitutde. For instance, given Ehsan's plot above, I would strongly doubt that the decrease of diversity between 600m and 700m is a general feature that should be expected generally in such mountainous regions. So as a general model a curve following these minima and maxima might not be useful (and how would you justify it theoretically?). It would then be safer to say that the diversity slightly increases with altitude (again interpreting Ehsan's plot), and that the *average* increase (= the slope of the straight line) is not very pronounced.This might be theoretically justified as well. I'd further note that there might be local influences on the deviersity with a considerable stronger impact on the diversity than the altitude has.
Curve fitting is the process of constructing a curve, or mathematical function, that has the best fit to a series of data points, possibly subject to constraints. Curve fitting can involve either interpolation, where an exact fit to the data is required, or smoothing, in which a "smooth" function is constructed that approximately fits the data. A related topic is regression analysis, which focuses more on questions of statistical inference such as how much uncertainty is present in a curve that is fit to data observed with random errors. Fitted curves can be used as an aid for data visualization, to infer values of a function where no data are available,] and to summarize the relationships among two or more variables. The objective of curve fitting is to theoretically describe experimental data with a model (function or equation) and to find the parameters associated with this model.
We know the coefficients estimate the trends while R-squared represents the scatter around the regression line. Moreover, we can still interpret the same the significant variables for both high and low R-squared models. However, low R-squared values are problematic when you need precise predictions. So, the things that you need to be done if you have significant predictors but a low R-squared value is that you need to add more variables to the model.
But to answer your question is that you can still indicate a real relationship between the significant predictors and the response variable even when R-squared is low but with low P values .