I have data of biological parameters and physico chmecial features which show show strongly non-linear relatiosnhip. How should I treat these type of data... Is Genralized linear model stat. test suitable for this
It is not clear what you are asking, but I am going to guess you want to know which modeling approach is best suited to modelling the non-linear relationship?
Assume that we are talking about a non-linear relationship between a continuous outcome and a continuous covariate. The easiest approach is to first plot out the two variables in a scatter plot and view the relationship across the spectrum of scores. That may give you some sense of the relationship. You can then try to fit the data using various polynomials or splines. Depending on the data, these two methods may provide similar fits or the splines may work better.
I hope this addresses your question. If not, please provide more details
It is not clear what you are asking, but I am going to guess you want to know which modeling approach is best suited to modelling the non-linear relationship?
Assume that we are talking about a non-linear relationship between a continuous outcome and a continuous covariate. The easiest approach is to first plot out the two variables in a scatter plot and view the relationship across the spectrum of scores. That may give you some sense of the relationship. You can then try to fit the data using various polynomials or splines. Depending on the data, these two methods may provide similar fits or the splines may work better.
I hope this addresses your question. If not, please provide more details
You can also use ROC curves for continuous predictor and dichotomous outcome (criterion, dependent variable). You can also use different types of regression which cover non-linear relationship between variables (based on e.g. exponential growth models, logistic model, exponential decay model etc.).
Ummer, "If your data is strongly non-linear" use non-linear methods to model your working variables relations. I consider that in this case the problem requires numerical methods and fitting tests from mathematics, not from statistics.
If you estimate the distribution of each variable, try to work each one with non parametric methods. In this case I use the Laplace premise that asigns frequence 1/N to each measured Xi value, and build Lorenz curves for each variable. I supose that your datasets are "representative" enough; that means that the sample gives X values not to far from averages of each ordered value with interval frequence 1/N, which produce a good U estimated mean, and acceptable shapes of distribution curves.
I have deep doubts about linear correlations derived from matrix analysis among variables.
@Ariel. Your guess is right. I got what you want to convey. Thank you Dr Emilio & Selmon for your views. I will read more about non linear modelling by taking your suggestions
In general, it is better to leave the variables in their original (untransformed) state, so that the interpretation is straightforward. My original suggestion to use polynomials or splines was intended so that the scale does not change. These are also generalizable to any regression model, so that the user is not limited to a particular approach.
Let me elaborate on my statement "so that the interpretation is straightforward" with a hypothetical example:
Assume we are testing the relationship between patient age and their blood pressure, measured in mmHg. Now we find a non-linear relationship between the two. We have two choices, we transform one or both variables so that the model fits the data better, or we generate polynomials/splines to age.
If you are the doctor looking at the results, in the first instance (where the data are transformed), you'd need some way of transforming the data back to the original scale so that you can interpret and then act on the results. Of course, untransforming the data leads to other issues, but that we can discuss later.
Using the second approach (leaving the data in the original scale), the doctor has no problems interpreting the results.
I have no problem agreeing with you, Theo, when the data analysis is strictly about model fit. However, when the results are intended for practical use, then we should strive to find the model that leaves the data easiest to interpret.
I agree wtih Ariel. Also, transformation of data might create some distortions. Practical interpretation will become difficult, after one puts log, square, square root, or their combinations on the data.
As to non-linear fit, there will be two types of non-linearity: 1) from the data and 2) from the model. Particularly, I am more interested in the second type, because you can build a strong non-linear model to fit the data. In such case, mods to either the variables or residuals distribution are needed.
Not saying no transformation at all, but I try to avoid it as much as I can, unless it is absolutely necessary for example first-difference and log in time series analysis.
In your statistical analysis try to use nonparametric statistical methods as much as possible, even for transformed data. For example, avoid to use Pearson's correlation coefficient for measuring the strength of dependece.
I absolutely agree with Theo that the modelling process must have both components for applied/practical scenarios - (a) the model fits the data well (with or without transformation) and (b) the resulting model must make sense to the ultimate end-user.
I would even go so far as to say that if the researcher modeling the data cannot come up with a good solution (ie., good fitting model), then it may be better to drop the idea altogether, rather than try and force a solution onto the problem.
I absolutely enjoy these cross-discipline discussions. All readers benefit from them.
What you have mentioned in your question is “it is non-linear”. Different types of non-linear relationships are there. Check that first. As Jihen mentioned, see the outliers in the given dataset. Transformation also you can try
To start: Nothing against transforming variables, you just have to know what the transfromation does to your data, how this affects for example your model estimates and finally your conclusions. If you go for multivariate analyses your transformation can affect the relative importance of explanatory variables in your data-set, e.g. the measurement scale can affect the outcome of a PCA analysis. Second, I guess you mean by non-linear that in a scatterplot a straight line does not well describe the relationship between two variables!? In that case there are plenty of options to model such a non-linear pattern (in R e.g. nls, GAMs, glm). GLMs are often applied because the measurement scale of your response variable and related error structure can not be adequately described with a normal distribution (e.g. count or binary data), not necessarily because the data is "non-linear" (as defined above). So the choice of a specific GLM results from the kind of response variable you have. Third, whether to transform or not also depends in my opinion on the question you would like to analyse and how you would like to proceed. With a non-linear relationship and one variable steadily increasing with another a Spearman correlation would be sufficient to show the monotonous increase, yet if you want to use your model to predict and derive confidence limits etc. you need to get a bit more sophisticated. This brings me to my last point... I would strongly recommend to first think about what the relationship between your variable should be "simply" based on the biological understanding of your system. Then when you know you can try to apply a model related to that (you can also think of this as you use your model to test certain hypothesis, e.g. as often done in ecology with a number of alternative models and their comparision using inforamtion theoretic approaches. Ideally hypothesis defined before data is collected). Also, if you have a high number of measurements you risk to fit several different models that come out "significant" (Simply because the high N gives you many df, but not necessarily because the model describes a causal relationship). Never skip exploring whether the model adequately describes your data and the assumptions are met. Ok these are some thoughts... that I hope help. Look forward to read more answers to this interesting questions!.