How to choose ordination method, such as PCA, CA, PCoA, and NMDS?

25 April 2014 53 3K Report

Ordination is vital method for analysis community data, but I really don't know how to choose suitable method and these different.

Aimeric Blaud

The choice of ordination methods depends on 1) the type of data you have, 2) the similarity distance matrix you want/can use, and 3) what you want to say. All of these ordination methods are based on similarity distance matrix constructed on your data, using different methods (such as Euclidean, Bray-Curtis (=Sorensen), Jaccard etc.) to calculate the distance between samples. However, the different methods to calculate the similarity matrix will not give the same results. Different ordination methods use different similarity matrix, and can significantly affect the results. For example, PCA will use only Euclidean distance, while nMDS or PCoA use any similarity distance you want.

So, how to choose a method?

- If you have a dataset that include null values (e.g. most dataset from genotyping using fingerprinting methods include null values, when for example a bacterial OTU is present in some samples and not in others), I would advise you to use Bray-Curtis similarity matrix and nMDS ordination. Bray-Curtis distance is chosen because it is not affected by the number of null values between samples like Euclidean distance, and nMDS is chosen because you can choose any similarity matrix, not like PCA.

- if you have a dataset that do not include null values (e.g. environmental variables), you can use Euclidean distance, and use either PCA or nMDS, and you will see that in this case, it will give you the same results.

Many ordination methods exist, such as the ones you mentioned, but also RDA (Redundancy analysis), CAP (canonical analysis of principal coordinates), dbRDA (distance based redundancy analysis), and others… Some methods will be better than others to show complex community or a specific effect of a factor on your data. For example, CAP will be good to show the effect of the interaction between factors on your community. So sometimes, it is good to try different methods if you are not happy about the results, but keep in mind that these methods are “only” ordination, and you need to perform test for significant differences between groups (e.g. ANOSIM, ADONIS, PERMANOVA, MRPP…).

Often different ordination methods and different features/characteristics than you will find interesting, such overlay vectors or extra variables, % explained by each axis, 3D… However, all these details are more software related than truly related to the ordination methods.

You can find more information about ordination methods and also test for significant differences between groups in this review:

A. Ramette (2007) Multivariate analyses in microbial ecology, FEMS Microbiology Ecology, 62, 142-160.

Hope that help

Aimeric

Chitra Bahadur Baniya

Hi,

Have you received answer?

What I used to perform in method selection? I first perform DCA on the sample by species dataset. If the lengths of axis is greater or equal to 2.5 then I prefer to utilize CCA otherwise I stick to linear methods such as RDA, PCA or CA. But CCA must have reasonably justifiable environmental variable/s.

Thanks.

Chitra

Md. Masum Billah

Thanks a lot for your helpful suggestions!@Aimeric Blaud

Adebola Lateef

It majorly depends on What you want to say ( you can view your data from different meaningful angle, but what u want to explain will determine the analysis you need) and your type of data. Try different analysis type and see which one depicts what you intend to explain.

José Antonio Vázquez-García

Is depends on the aims and scope of your research. Once you have selected the technique to use, previous to the analysis, your data may need specific adjustments, depending on your objectives and techniques you want to use. Usually ecological data are highly heterogeneous, including lots of zeros (null values, absences of species, resulting from a large number of rare species in most ecological community samples) and several techniques perform badly with this kind of data.

Here are some questions that may guide you to select a proper tool of analysis:

1) Are you comparing existing groups? consider DA, MRPP, perMANOVA, ISA

2) Are you looking for groups? consider Cluster Analysis (Flexible Beta is great with any distance measure, you can control the space distorting properties), Ward (Euclidean only), avoid Twinspan (performs poorly with more than one important gradient).

4) Focus on Direct vs. Indirect gradient analysis?

Beals (1984) makes two strong statements about the advantages of Indirect gradient analyses (=sociological ordination) over direct (=environmental ordination) gradient analysis:

i) "Species differences between two samples do reflect their environmental differences, but in a highly integrated fashion, which includes differences in biotic interactions and historical events. The environmental differences are automatically scaled according to overall species response. Therefore the ordination with the clearest species patterns reflects the environmental space the way biotic communities interpret it."

ii) "The disadvantage of environmental ordination is that one must prejudge which are the important environmental factors to the vegetation or the fauna. An environmental ordination may omit important variables; it is often biased toward those factors most easily measured; measured variables may be scaled wrong; and biotic patterns imposed by competition, predation and other interactions are ignored."

5) Do you still want to focus on Direct gradient analysis? NPMR (for a single response variable), otherwise RDA (linear) or CCA (unimodal).

6) Do you want to focus on Indirect gradient analysis? NMS (powerful method in community ecology, valid for any distance measure and any number of dimensions); CA, RA or WA (for a single dimension=gradient only); Avoid using DCA (a heavily manipulated technique, except for its first dimension, equivalent to CA). Avoid using PCA (unless linear relationships in the main matrix are met).

For more details see:

McCune, B. & Grace, J. B. 2002. Analysis of Ecological Communities. Gleneden Beach, Oregon, USA.

Aimeric Blaud

Hi all,

This topic seems to interest lot of people. An article have been recently published in Molecular Ecology about the different multivariate methods in microbial ecology, but it is useful in many other fields of research. The article describe the different approach, explanatory methods, interpretive and statistical test. It is I think almost a complete (but not everything ;) overview of the tools we can use and it seems really helpful when you need to know which methods to use and why.

So overall it is an excellent article to have, completing the article from Ramette 2007 in FEMS Microbiology Ecology.

Title and link to the article:

Application of multivariate statistical techniques in microbial ecology

O. Paliy and V. Shankar

http://onlinelibrary.wiley.com/doi/10.1111/mec.13536/full

Yu Xia

Forsberg, Kevin J., Sanket Patel, Molly K. Gibson, Christian L. Lauber, Rob Knight, Noah Fierer, and Gautam Dantas. “Bacterial Phylogeny Structures Soil Resistomes across Habitats.” Nature 509, no. 7502 (May 21, 2014): 612–16. doi:10.1038/nature13377.

For your information, I have read some nature paper using Bray-curtis distance matrix for PCoA analysis.

An Ni Zhang

Actually, PCoA is not limited to Euclidean distance only, the same with NMDS. It can take any distance measures and adjust its functions to combine the original variables according to your dissimilarity measure. If you use beta_diversity_through_plots.py in Qiime to generate beta diversity distance matrices for PCoA, you may choose different distance measures (-s).

PCoA, PCA are less computer intensive than NMDS.

PCoA, CA, NMDS also consider double zeros situation (better than PCA).

Non-Euclidean measures should be chosen for data set with zero.

My suggestion is that PCoA and NMDS could be considered as equally informative with ecological data but dissimilarity measures and data transformation are more important.

Rizwan Ali Ansari

Dear All

I have applied some biofertilizers in the field soil to know their impact on soil variables (enzymes, ph, EC) and plant growth variables (plant height, fresh and dry weight branch etc.). What kind of ordination method can be used in this data set? For instance, PCA, PCoA, CCA, DCA, RDA etc. Also how to frame the data matrix? Should all the data in a single excel file? Data set of column and row is also important. Please suggest any link.

Regards

Rizwan

+919412819870

Email: [email protected]

Benoit Vanhee

Hi Rizwan,

With quantitativ data (no zero asymetry), I recommand to use Principal Component Analysis.

Sara Luna

Hello every one,

i am using T-RFLP to study AMF communities beetwen sites, however, i have different sample number for each site. How do i analyse it?

thank you

RAJAL DEBNATH

@Sara Patricia Luna: I think the below mentioned publication will be of help. The R code for the program is available as supplementary material (I believe). You can use the program with DGGE and other fingerprints as well

Kropf, S.,Heuer, H.,Grüning, M.,and Smalla,K.(2004). Significance

test for comparing complex microbial community fingerprints using

pairwise similarity measures. J. Microbiol.Methods 57, 187–195.doi:

10.1016/j.mimet.2004.01.002

Sara Luna

Thank you, RAJAL DEBNATH

Aimeric Blaud

Having unbalance samples should not be a problem to analyse your T-RFLP data. You just need to follow "normal" procedure. I would normalise and square root the data and then performed an nMDS using Bray-Curtis similarity matrix. Then you can run an ANOSIM to test for differences between sites and the fact of having unbalance samples is not an issue with these analysis.

Ikram Dahmani

Hi Mr Blaud,

Thank you for this great information about how we can choose ordination method.

I have a question about the percentage explained by each axis for NMDS analysis, how can calculate this?

cordially

Julian Trachsel

Hi, Ikram, I 'm pretty sure you can't assign a % variation explained to each axis in NMDS. What you should consider for this type of analysis is the stress, or how well the distances in your plot represent the (dis)similarities you used to generate the NMDS. Lower stress means that the distances in your plot do a good job of representing the calculated similarity.

Alternatively, you could calculate which species or features correlate with various directions on your ordination. I think vegan's envfit function can accomplish this.

Francisco Calaça

Good explanations!

Ikram Dahmani

Hi Julian, Thanks for your explain, exactly, I had already used function envfit in the package "vegan" for relating community data to environemental data and I had a stress value of the 0.1564409, what do you think?

Julian Trachsel

Sounds like you are on the right track. Check out this website for a guide to interpreting stress values (and NMDS in general) https://mb3is.megx.net/gustame/dissimilarity-based-methods/nmds

Ikram Dahmani

Hi Julian,Thank you very much for you help.

Lan Liu

A useful paper named "Multivariate analysis of ecological communities in R-vegan tutorial". It tells the differences and many related information. Hope it can help!

Vijay Singh Meena

This issue is very helpful me also, thanks

Elizabeth Larson

Great thread. Thanks for all the generous replies and suggestions!

Jorge Antonio Gomez-Diaz

Hi, these are tha main uses of the ordination methods:

Principal component analysis (PCA):

Euclidean distance

Parametric (like ANOVA)

Based on eigenvectors

Raw and quantitative data.

Preserves the Euclidean distance between sites.

Correspondence analysis (CA):

X2 distance

Frequency or similar data, dimensionally homogeneous and non-negative.

Keep distance c2 between rows or columns.

Used in ecology to analyze tables of species data.

Main coordinate analysis (PCoA):

A lot of distances

Arrangement of distance matrices (Q mode), instead of site by variables.

Flexibility in the choice of association measures.

Non-metric multidimensional scaling (NMDS):

It is not a method based on eigenvectors.

It tries to represent the set of objects along a predetermined number of axes while preserving the ordering relationships between them.

Sources:

Legendre, P., & Legendre, L. F. (2012). Numerical ecology (Vol. 24). Elsevier.

Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R. Springer.

Francisco Calaça

Good, Jorge Antonio Gomez-Diaz! Thanks for more explanations!

FJSC.

Kishor Sharma

The above discussion greatly helped me to chose a better ordination method for my current research. Thank you .

The book Borcard, D., Gillet, F., & Legendre, P. (2018). Numerical ecology with R. Springer mentioned @ Jorge Antonio Gomez-Diaz is very useful.

Altaf Hussain

is PCoA really limited to Euclidean distance? I have seen people using Gower distance and what not!

Cedric Laczny

PCoA using *Euclidean* distances is basically PCA. The "advantage" of PCoA is that you can use *other* distance/(dis)similarity measures, s.a., https://mb3is.megx.net/gustame/dissimilarity-based-methods/principal-coordinates-analysis.

Hence, PCoA with Gower distance is possible, or UniFrac distance, or Bray-Curtis dissimilarity, etc.

Best,

Cedric

Altaf Hussain

Exactly, thanks for the clarification

Aimeric Blaud

PCoA is not limited to Euclidean distance but work with any dissimilarity measure. Sorry for not updating my answer before, creating confusion. I now updated it.

PCoA is now commonly used with Bray-Curtis and UniFrac distance (weighted or unweighted) as Cedric mentioned.

Shbbir Raza Khan

Aimeric Blaud thanks Sir, your explanations are always very help full for me.

Thanks once again

Hudhaifa maan Al-Hamndi

Very good and interesting discussion

Carlos Freitas

This is a really relevant discussion on an important topic for a lot of people working with community ecology. I believe Dr Blaud addressed the main questions. I just like to add that CA has the arch effect problem and DCA is not a enough sollution.

Maybe I could emphasize that the choice for the best ordination method should be addressed looking for the ecological question and the available data set.

Tluang Hmung Thang

Very nice discussion and explanation!

Bismark Ofosu-Bamfo

The discussion above has been very helpful.

Abhijit Mitra

Trying to get detailed information

Subodh Adhikari

Hello All,

Here are few more relevant/important sources for "community ecologists":

1. McCune, B. and J.B. Grace. 2002. Analysis of Ecological Communities. MJM Press (there are several good chapters).

2. Digby, P.G.N. and R.A. Kempton. 1987. Multivariate analysis of ecological communities. Chapman & Hall

3. Legendre, P. and Legendre, L. 2012. Chapter 7 – Ecological resemblance (Chapter 8 – Cluster analysis.). In: Legendre, P. and Legendre, L. 1998, Numerical ecology. Elsevier.

4. McCune, B. and Kent, M. 2012. Chap. 6 – Ordination methods. Pages 171–271.

5. Everitt, B. and T. Hothorn. Chaps 3–4. PCA and NMDS.

6. Borcard, Gillet and Legendre. Unconstrained Ordination (and Chap 6: Canonical Ordination).

More from whom I took multivariate analysis class (DW Roberts has also written some R packages like "labdsv". I use this package along with "vegan"):

7. Roberts, D.W. 1986. Ordination on the basis of fuzzy set theory. Vegetatio 66:123-131.

8. Roberts, D.W. 2008. Statistical analysis of multidimensional fuzzy set ordination. Ecology 89:1246-1260.

9. Roberts, D.W. 2015. Vegetation classification by two new iterative reallocation optimization algorithms. Plant Ecology 216(5):741–758.

10. Roberts, D.W. labdsv: https://cran.r-project.org/web/packages/labdsv/labdsv.pdf

And, there are certainly hundreds of more resources you could find.

Cheers!

Subodh

Chimi Djomo Cédric

very nice technical question and detailled explication.

thank to all

Mehrdad Rabiei

To my knowledge, I have seen a lot of times PCoA, and NMDS methods used in papers. Just focus on high-ranking papers and then pick one up that is similar to your experiment. I think it is the simplest way to catch your answer.

Abhijit Mitra

Ordination Methods - an overview

Michael W. Palmer - may be helpful

Bulbul Ahmed

Such a nice reading. Thanks all, specially Aimeric Blaud.

Ícaro Castro

Very nice discussion. Thanks all.

Mekdimu Mezemir Damtie

Thank you Dr Aimeric Blaud

Sangwook Scott Lee

Thank you for your helpful advice, especially Aimeric Blaud .

David J. Gibson

This annotated bibliography may be helpful: Ordination Analysis

DOI: 10.1093/OBO/9780199830060-0003

Paul Somerfield

And for a general overview see https://mb3is.megx.net/gustame

Negin Katal

I had in my master program such nice course about Mutlivariation analysis in ecology and for your problem we had a very useful sheet which I share it with you here. I hope it can help you.

Serkan Özdemir

You can choose the most appropriate ordination method, taking into account the distinctions on the axes. So, applying one method would be wasteful. It is better to use various methods and choose the ideal one.

Ibrahim Tavuç

https://www.jstor.org/stable/1934302?seq=2#metadata_info_tab_contents

Carlos Freitas

Dear Negin Katal, I really appreciate the table you shared. Another potential approach to include environmental drivers is to take them as factors of a Permanova. After it you can use a NMDS to show the pattern. A vantage is that is possible to use the same matrix of distances.

Adijailton Jose de Souza

Please look GUSTA ME: https://mb3is.megx.net/gustame

Riccardo Soldan

While I agree that are guidelines on the use of these methods, it is impossible to know which one is best. This is because ordination methods based on distance matrices are not model based approaches. I would highly suggest the usage of latent variable models and the package gglvm. In general, have a look at the work of David Warton.

Ashraf M. T. Elewa

It depends on your input data. You should first know whether your data belong to the fixed mode or to the random mode. Not all methods are suitable for fixed mode data.

Badges
Science method

More Bin Ma's questions See All

How can women be responsive when they can make love for hours?

… with my present female lover … she and I spend anywhere from two hours to six hours in caressing, touching, cuddling, hugging, lip kissing, deep kissing and intimate conversation before,...

11 August 2024 4,521 0 View

Why do men not accept that continually hassling for sex proves that they want it more than their partner?

Your partner’s not there to service you, it’s not their job to keep you sexually satisfied. You’re together because you love each other and want to make each other happy. Constantly hassling them...

08 August 2024 1,491 0 View

Why do we equate male and female arousal?

Women, on the other hand, can become physically aroused (increased blood flow in the reproductive organs) without becoming psychologically aroused even in the slightest. (Robert Weiss)

05 August 2024 9,537 2 View

Why do women not understand that men are aroused by physical contact?

Women often complain that their husbands never touch them unless they want sex. (Michele Weiner-Davis)

02 August 2024 7,778 2 View

Why do women usually need more persuading than men do to have sex with a new lover?

Women need to feel a degree of sexual intimacy before sex becomes desirable… For women, intimacy sometimes results in sex; for men, sex sometimes results in intimacy. (Marina Muratore)

31 July 2024 8,860 0 View

Why do men and women confuse platonic love and sex?

Women associate affection with love. … Men associate affection much more directly with sex. … Men see affection of any kind as a sexual invitation. Many women find this bewildering. (Kramer &...

30 July 2024 9,498 2 View

How to use energy flexibility in inventory modeling?

29 July 2024 2,192 5 View

How can I find writers for drafting, submitting, and publishing papers?

Looking for paper collaboration writers： Collaboration Model 1: 1.1 Based on the chosen topic, complete the paper writing, select an SCI or SSCI Q1 journal, submit using my QRCID, and complete...

27 July 2024 8,965 1 View

Why do women use fantasy to achieve arousal alone?

Women also often find it easier to fantasise when self-pleasuring than in sex with a partner. The immediacy of someone else’s needs actually inhibits the expression and satisfaction of their own....

26 July 2024 8,351 2 View

A positively charged and 10x his-tagged protein that doesn't bind to any chromatographic resin?

Hello everyone, I am currently working on a protein that is 10x his-tagged and positively charged (predicted pI=10.16). But when I tried to use Ni-NTA column to purify the protein, it's not...

24 July 2024 7,293 4 View

Why Do TDS and EC Increase with Larger Wastewater Volumes, While BOD and COD Decrease?

I have carried out MFC experiments on three different volumes, 50, 500 and 1000 mL of wastewater. Results after MFC treatment shows that TDS and EC are more in larger volumes of water i.e. TDS and...

09 August 2024 9,621 0 View

What roles do microbial communities play in the environment and roles played by microorganisms in our environment?

25 July 2024 1,992 2 View

How do bacterial species contribute to ecosystem functions and nutrient cycling and role of microbial communities for sustainability?

25 July 2024 2,351 2 View

How do microbial communities contribute to ecological cycles on Earth and how does an agroecological system support diverse microbial communities?

25 July 2024 5,197 2 View

What are the modern approaches to the organization and sustainable management of greenery in urbanized cities?

What are the modern approaches to the organization and sustainable management of greenery in urbanized cities? How can ecological and recreational functions be introduced into the landscape and...

23 July 2024 7,292 4 View

What does negative readings in spectrophotometer denote. Was testing TSS content of bacterial cultures?

I was performing estimation of Total Soluble Sugar content in bacterial cultures by Anthrone method. Out of 9 samples, only one sample showed positive reading while others had a negative reading...

22 July 2024 9,576 4 View

Is it possible to estimate population using animal footprint?

I'm trying to use cassowary footprint to estimate their population since their footprints are distinguishable from other animal on the field. is it really possible to do that? since it's really...

18 July 2024 3,729 7 View

Why flow rate of MFC for pure NH3 gas decreases with time?

We have been using MFC to control the flow rate of pure ammonia gas. The MFC is placed in an oven at 50 deg. C to avoid blockage due to NH3. However, after a few days of usage, the MFC is unable...

11 July 2024 5,796 1 View

What is the association and mechanism between bacterial diversity in the oral microenvironment and clinical manifestations of periodontal disease?

This research question explores the complex relationship between bacterial diversity in the oral microenvironment and the clinical manifestations of periodontal disease. The oral microenvironment...

09 July 2024 6,210 1 View

Designing Inclusive City. What's your opinion regarding my group's campaign below?

Hello! I have a campaign assignment related to inclusive cities. Would you mind to take a look and give your opinion regarding my group's campaign? What kind of public facilities do you think are...

07 July 2024 368 0 View