The choice of ordination methods depends on 1) the type of data you have, 2) the similarity distance matrix you want/can use, and 3) what you want to say. All of these ordination methods are based on similarity distance matrix constructed on your data, using different methods (such as Euclidean, Bray-Curtis (=Sorensen), Jaccard etc.) to calculate the distance between samples. However, the different methods to calculate the similarity matrix will not give the same results. Different ordination methods use different similarity matrix, and can significantly affect the results. For example, PCA will use only Euclidean distance, while nMDS or PCoA use any similarity distance you want.
So, how to choose a method?
- If you have a dataset that include null values (e.g. most dataset from genotyping using fingerprinting methods include null values, when for example a bacterial OTU is present in some samples and not in others), I would advise you to use Bray-Curtis similarity matrix and nMDS ordination. Bray-Curtis distance is chosen because it is not affected by the number of null values between samples like Euclidean distance, and nMDS is chosen because you can choose any similarity matrix, not like PCA.
- if you have a dataset that do not include null values (e.g. environmental variables), you can use Euclidean distance, and use either PCA or nMDS, and you will see that in this case, it will give you the same results.
Many ordination methods exist, such as the ones you mentioned, but also RDA (Redundancy analysis), CAP (canonical analysis of principal coordinates), dbRDA (distance based redundancy analysis), and others… Some methods will be better than others to show complex community or a specific effect of a factor on your data. For example, CAP will be good to show the effect of the interaction between factors on your community. So sometimes, it is good to try different methods if you are not happy about the results, but keep in mind that these methods are “only” ordination, and you need to perform test for significant differences between groups (e.g. ANOSIM, ADONIS, PERMANOVA, MRPP…).
Often different ordination methods and different features/characteristics than you will find interesting, such overlay vectors or extra variables, % explained by each axis, 3D… However, all these details are more software related than truly related to the ordination methods.
You can find more information about ordination methods and also test for significant differences between groups in this review:
A. Ramette (2007) Multivariate analyses in microbial ecology, FEMS Microbiology Ecology, 62, 142-160.
What I used to perform in method selection? I first perform DCA on the sample by species dataset. If the lengths of axis is greater or equal to 2.5 then I prefer to utilize CCA otherwise I stick to linear methods such as RDA, PCA or CA. But CCA must have reasonably justifiable environmental variable/s.
It majorly depends on What you want to say ( you can view your data from different meaningful angle, but what u want to explain will determine the analysis you need) and your type of data. Try different analysis type and see which one depicts what you intend to explain.
Is depends on the aims and scope of your research. Once you have selected the technique to use, previous to the analysis, your data may need specific adjustments, depending on your objectives and techniques you want to use. Usually ecological data are highly heterogeneous, including lots of zeros (null values, absences of species, resulting from a large number of rare species in most ecological community samples) and several techniques perform badly with this kind of data.
Here are some questions that may guide you to select a proper tool of analysis:
1) Are you comparing existing groups? consider DA, MRPP, perMANOVA, ISA
2) Are you looking for groups? consider Cluster Analysis (Flexible Beta is great with any distance measure, you can control the space distorting properties), Ward (Euclidean only), avoid Twinspan (performs poorly with more than one important gradient).
4) Focus on Direct vs. Indirect gradient analysis?
Beals (1984) makes two strong statements about the advantages of Indirect gradient analyses (=sociological ordination) over direct (=environmental ordination) gradient analysis:
i) "Species differences between two samples do reflect their environmental differences, but in a highly integrated fashion, which includes differences in biotic interactions and historical events. The environmental differences are automatically scaled according to overall species response. Therefore the ordination with the clearest species patterns reflects the environmental space the way biotic communities interpret it."
ii) "The disadvantage of environmental ordination is that one must prejudge which are the important environmental factors to the vegetation or the fauna. An environmental ordination may omit important variables; it is often biased toward those factors most easily measured; measured variables may be scaled wrong; and biotic patterns imposed by competition, predation and other interactions are ignored."
5) Do you still want to focus on Direct gradient analysis? NPMR (for a single response variable), otherwise RDA (linear) or CCA (unimodal).
6) Do you want to focus on Indirect gradient analysis? NMS (powerful method in community ecology, valid for any distance measure and any number of dimensions); CA, RA or WA (for a single dimension=gradient only); Avoid using DCA (a heavily manipulated technique, except for its first dimension, equivalent to CA). Avoid using PCA (unless linear relationships in the main matrix are met).
For more details see:
McCune, B. & Grace, J. B. 2002. Analysis of Ecological Communities. Gleneden Beach, Oregon, USA.
This topic seems to interest lot of people. An article have been recently published in Molecular Ecology about the different multivariate methods in microbial ecology, but it is useful in many other fields of research. The article describe the different approach, explanatory methods, interpretive and statistical test. It is I think almost a complete (but not everything ;) overview of the tools we can use and it seems really helpful when you need to know which methods to use and why.
So overall it is an excellent article to have, completing the article from Ramette 2007 in FEMS Microbiology Ecology.
Title and link to the article:
Application of multivariate statistical techniques in microbial ecology
Forsberg, Kevin J., Sanket Patel, Molly K. Gibson, Christian L. Lauber, Rob Knight, Noah Fierer, and Gautam Dantas. “Bacterial Phylogeny Structures Soil Resistomes across Habitats.” Nature 509, no. 7502 (May 21, 2014): 612–16. doi:10.1038/nature13377.
For your information, I have read some nature paper using Bray-curtis distance matrix for PCoA analysis.
Actually, PCoA is not limited to Euclidean distance only, the same with NMDS. It can take any distance measures and adjust its functions to combine the original variables according to your dissimilarity measure. If you use beta_diversity_through_plots.py in Qiime to generate beta diversity distance matrices for PCoA, you may choose different distance measures (-s).
PCoA, PCA are less computer intensive than NMDS.
PCoA, CA, NMDS also consider double zeros situation (better than PCA).
Non-Euclidean measures should be chosen for data set with zero.
My suggestion is that PCoA and NMDS could be considered as equally informative with ecological data but dissimilarity measures and data transformation are more important.
I have applied some biofertilizers in the field soil to know their impact on soil variables (enzymes, ph, EC) and plant growth variables (plant height, fresh and dry weight branch etc.). What kind of ordination method can be used in this data set? For instance, PCA, PCoA, CCA, DCA, RDA etc. Also how to frame the data matrix? Should all the data in a single excel file? Data set of column and row is also important. Please suggest any link.
@Sara Patricia Luna: I think the below mentioned publication will be of help. The R code for the program is available as supplementary material (I believe). You can use the program with DGGE and other fingerprints as well
Having unbalance samples should not be a problem to analyse your T-RFLP data. You just need to follow "normal" procedure. I would normalise and square root the data and then performed an nMDS using Bray-Curtis similarity matrix. Then you can run an ANOSIM to test for differences between sites and the fact of having unbalance samples is not an issue with these analysis.
Hi, Ikram, I 'm pretty sure you can't assign a % variation explained to each axis in NMDS. What you should consider for this type of analysis is the stress, or how well the distances in your plot represent the (dis)similarities you used to generate the NMDS. Lower stress means that the distances in your plot do a good job of representing the calculated similarity.
Alternatively, you could calculate which species or features correlate with various directions on your ordination. I think vegan's envfit function can accomplish this.
Hi Julian, Thanks for your explain, exactly, I had already used function envfit in the package "vegan" for relating community data to environemental data and I had a stress value of the 0.1564409, what do you think?
Sounds like you are on the right track. Check out this website for a guide to interpreting stress values (and NMDS in general) https://mb3is.megx.net/gustame/dissimilarity-based-methods/nmds
A useful paper named "Multivariate analysis of ecological communities in R-vegan tutorial". It tells the differences and many related information. Hope it can help!
PCoA using *Euclidean* distances is basically PCA. The "advantage" of PCoA is that you can use *other* distance/(dis)similarity measures, s.a., https://mb3is.megx.net/gustame/dissimilarity-based-methods/principal-coordinates-analysis.
Hence, PCoA with Gower distance is possible, or UniFrac distance, or Bray-Curtis dissimilarity, etc.
PCoA is not limited to Euclidean distance but work with any dissimilarity measure. Sorry for not updating my answer before, creating confusion. I now updated it.
PCoA is now commonly used with Bray-Curtis and UniFrac distance (weighted or unweighted) as Cedric mentioned.
This is a really relevant discussion on an important topic for a lot of people working with community ecology. I believe Dr Blaud addressed the main questions. I just like to add that CA has the arch effect problem and DCA is not a enough sollution.
Maybe I could emphasize that the choice for the best ordination method should be addressed looking for the ecological question and the available data set.
Here are few more relevant/important sources for "community ecologists":
1. McCune, B. and J.B. Grace. 2002. Analysis of Ecological Communities. MJM Press (there are several good chapters).
2. Digby, P.G.N. and R.A. Kempton. 1987. Multivariate analysis of ecological communities. Chapman & Hall
3. Legendre, P. and Legendre, L. 2012. Chapter 7 – Ecological resemblance (Chapter 8 – Cluster analysis.). In: Legendre, P. and Legendre, L. 1998, Numerical ecology. Elsevier.
4. McCune, B. and Kent, M. 2012. Chap. 6 – Ordination methods. Pages 171–271.
5. Everitt, B. and T. Hothorn. Chaps 3–4. PCA and NMDS.
6. Borcard, Gillet and Legendre. Unconstrained Ordination (and Chap 6: Canonical Ordination).
More from whom I took multivariate analysis class (DW Roberts has also written some R packages like "labdsv". I use this package along with "vegan"):
7. Roberts, D.W. 1986. Ordination on the basis of fuzzy set theory. Vegetatio 66:123-131.
8. Roberts, D.W. 2008. Statistical analysis of multidimensional fuzzy set ordination. Ecology 89:1246-1260.
9. Roberts, D.W. 2015. Vegetation classification by two new iterative reallocation optimization algorithms. Plant Ecology 216(5):741–758.
To my knowledge, I have seen a lot of times PCoA, and NMDS methods used in papers. Just focus on high-ranking papers and then pick one up that is similar to your experiment. I think it is the simplest way to catch your answer.
I had in my master program such nice course about Mutlivariation analysis in ecology and for your problem we had a very useful sheet which I share it with you here. I hope it can help you.
You can choose the most appropriate ordination method, taking into account the distinctions on the axes. So, applying one method would be wasteful. It is better to use various methods and choose the ideal one.
Dear Negin Katal, I really appreciate the table you shared. Another potential approach to include environmental drivers is to take them as factors of a Permanova. After it you can use a NMDS to show the pattern. A vantage is that is possible to use the same matrix of distances.
While I agree that are guidelines on the use of these methods, it is impossible to know which one is best. This is because ordination methods based on distance matrices are not model based approaches. I would highly suggest the usage of latent variable models and the package gglvm. In general, have a look at the work of David Warton.
It depends on your input data. You should first know whether your data belong to the fixed mode or to the random mode. Not all methods are suitable for fixed mode data.