Which multivarite analyses Classificationa and Regression Tree (CART) or Generalized Linear Model (GLM) is better?

Hi, Choki,

Firstly, in the environmental data you could use RDA; Redundancy Analysis is a direct extension of multiple regression, as it models the effect of an explanatory matrix X (n x p) on a response matrix Y (n x m). This is done by preforming an ordination of Y to obtain ordination axes that are linear combinations of the variables in X. In RDA, ordination axes are calculating from a PCA of a matrix Yfit, computed by fitting the Y variables to X by multivariate linear regression. Note that the explanatory variables in X can be quantitative, qualitative or binary variables. Prior to RDA, explanatory variables in Y must be centered, standardized (if explanatory variables are not dimensionally homogeneous, i.e. in different units), transformed (to limit the skew of explanatory variables) or normalized (to linearize relationships) following the same principles as in PCA. Collinearity between the X variables should also be reduced before RDA.

In order to obtain the best model of RDA, explanatory variables can be selected by forward, backward or stepwise selection (like GLM) that remove non-significant explanatory variables.

Then, you should use Multivariate regression tree (MRT) is a constrained clustering technique. MRTs allow the partitioning of a quantitative response matrix by a matrix of explanatory variables constraining (guiding) on where to divide the data of the response matrix. RDA and MRT are both regression techniques, the former explaining the global structure of relationships through a linear model, the latter better highlighting local structures and interactions among variables by producing a tree model.

And for more detailed and yet more manageable output can be generated by using the wrapper from the function MRT() of the MVPARTwrap package in R language. Plus, this other function allows identification of discriminant species.

All Best,

Hein Van Gils

We tested both algorithms; for our findings see attached.

Menaa already advised you well on step-wise reduction of the number of variables.

Article Transferability of species distribution models: The case of ...

Newton Pimentel de Ulhôa Barbosa

It's preferable to use both algorithms. If you are plannig to predict something, be careful with gaussian GLM's, they can surpass the reality. In this case non-linear models as CART are better. You can try Nonlinear Regression too...

Binomial or Poisson distribution?

What should p-value be reported when its 0.000 in SPSS?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How are iso-frequency contours plotted?

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hello Everyone ! I'm looking for a good journal to publish my manuscript with low publication cost?