Should I rarefy the microbiome count data before calculating alpha diversity?

More Haitao Wang's questions See All

Why does the MFDFA algorithm need to calculate the profile of the time series?

As described in the Multifractal detrended fluctuation analysis (MFDFA) algorithm, it at first calculates the profile of the time series, and then other steps are operated on the profile....

05 August 2024 9,366 2 View

Differences between deep seated landslides and slope destabilization?

Hi, Could someone explain the primary differences between deep-seated landslides and slope destabilization? In particular, definition and characteristics, mechanisms and triggering factors,...

02 August 2024 4,212 2 View

A question about arbuscular mycorrhizal???

How long it takes for arbuscular mycorrhiza to establish and produce benefits under experimental conditions？

25 July 2024 5,208 2 View

My question concerns MTT cell viability assay?

Despite not having cells in the media, I am getting purple color. I have tried many troubleshooting methods, varying media types, and even different MTTs from different companies to figure out the...

21 July 2024 9,914 1 View

Is it possible to run the AIMD within a system using virtual crystal approximation (VCA)?

I want to study the thermal properties of a mixed system which is constructed by virtual crystal approximation in VASP. When I try to run the ab initio Molecular Dynamics of this system in VASP, I...

19 July 2024 6,569 3 View

If I want to invent my own hypothesis testing method, where should I get started ?

15 July 2024 5,376 5 View

Recommendations for Rapid Publication Journals in Traffic and Transportation?

I am currently working on a research paper focused on the control of Connected and Autonomous Vehicles (CAVs) utilizing multi-agent reinforcement learning methods. At this stage, I am seeking a...

14 July 2024 2,620 2 View

I've earned 1 best paper award and 4 best oral presentation awards. What should I do next?

I've earned 1 best paper award and 4 best oral presentation awards. What should I do next to elevate my academic capabilities to the next level ?

14 July 2024 6,071 5 View

How to start writing an anti-virus software ?

I read several information security books. How do I start writing anti-virus softwares ？

13 July 2024 8,180 1 View

Are my cells contaminated with mycoplasma?

I suspect my cells are contaminated with mycoplasma. I fixed the cells with 4% PFA and stained them with DAPI. Below is the image I obtained. I don't observe the typical small, rounded DAPI foci...

11 July 2024 7,786 3 View

What are the key methods and indicators used in assessing the biodiversity of river ecosystems, and how do these methods account for variations ?

Biodiversity assessment of river ecosystems is crucial for understanding the health and stability of these environments. This question aims to explore the various techniques employed to evaluate...

07 August 2024 4,290 3 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

How to increase citation in Research Gate?

How to enhanced h-index in Research Gate?

04 August 2024 3,368 4 View

Which distribution type should I use when calculating the average particle size from TEM image? and how to calculate the error ?

average particle size calculation from TEM

04 August 2024 2,921 1 View

How to calculate effect size of AMCE (Average Marginal Component Effect) in Randomized Conjoint Experiment?

I am following Hainmueller, Hopkins, and Yamamoto's (2014) paper for my randomized conjoint experimental data analysis. The link to the paper is provided below. I received a comment from the...

02 August 2024 4,406 0 View

How microorganisms are important for maintaining of healthy soil and biodiversity and microorganisms and plant roots contribute to soil formation?

31 July 2024 8,939 5 View

H-index issues?

Hi Sir/Madam, I would like to know why my h-index is stuck to 15 despite the number of new publications added. Could you explain me how marks are granted. Best, Daniel

28 July 2024 2,607 2 View

How to conduct a sensitivity power analysis for Kendall's Tau?

Is there a straightforward way to conduct a sensitivity power analysis for a Kendall's Tau correlation? I was considering using the sensitivity setting and "Correlation: point biserial model" test...

28 July 2024 6,133 8 View

How to estimate sample size for GWAS of continuous and discrete traits? What are the pre-requisites?

Genome-wide association study (GWAS) Continuous traits: eg. Height Discrete traits: eg. Eye color

28 July 2024 286 0 View

How do microbial communities contribute to ecological cycles on Earth and how does an agroecological system support diverse microbial communities?

25 July 2024 5,197 2 View

Khondoker Dastogeer

You might check this out:

Article Waste Not, Want Not: Why Rarefying Microbiome Data Is Inadmissible

Wisam Thamer Al-Mayah

I face same problem

regards

Andrew Paul McKenzie Pegman

You just have to make the effort to calculate the accurate diversity

Lukas Beule

Haitao Wang, don't use 'rarefy' (random subsampling without replacement). If you habe read the paper that Khondoker Dastogeer shared, you will find out that it is "statistically inadmissible". With rarefy, you will receive biased libraries because of the random OTU picking. There are many other tools available for normalising microbiome data, which is crucial for alpha diversity analysis. We recently published an algorithm (SRS) that can be used as an alternative to rarefy and compared the two:

Article Improved normalization of species count data in ecology by s...

The algorithm is implemented in R and easy to use.

Feel free to sent me a message or reply to this answer if you have questions or comments.

Much success with your data, Lukas

Mathilde Borg Dahl

Hi Lukas Beule I tried to apply your SRS method to my seq data, but face a problem, which I think is the same Haitao struggles with, which is that I have to scale everything down to my smallest sample - and thereby thronging away A LOT of perfectly good data.

In my case I have sequence data ranging from 1,500 to 20,000 reads per samples. I would have to set Cmin to 1,500....

Do you have any comments/suggestions to that, is it better to discard more samples to begin with, i.e. samples with a low total read count...? but than you loose samples and get potentially unbalanced sample representation from your study design/experiment.

Hope someone will comment on this.

Best, Mathilde

Hi Mathilde Borg Dahl, this is rather a general problem than a problem that is specific to SRS or rarefy. Reducing the number of read counts will always be accompanied with a loss of information.

I can't give you any recommendations because I don't know your study design as well as the type of organisms and environment that you sequenced. Feel free to contact me by email for further discussion regarding your data.

Cheers, Lukas

Thank you for your fast reply. I will discuss a bit with Haitao (we are in the same group), maybe I will write a follow up.

As of right now, I think I will do SRS for the richness evaluations and use within-sample relative abundance for the rest of my community evaluation. The dataset is bacteria and fungi (amplicon seq).

Sounds like a plan!

just a short note:

SRS is now on CRAN https://CRAN.R-project.org/package=SRS

Please use version 0.2.1 (not 0.2.0 as it contains a little bug that we removed).

Our package also features a Shiny app that you can use to explore different Cmin.

If you have any questions regarding SRS, feel free to contact me.

So, I just looked at the richness in my raw data, the SRS normalized and a rarefied version - the two latter normalized to the smallest sample size (1,500 reads).

I attach the overview here.

SRS changed very little, only the top richest samples have 'lost' one (max two) 'taxa'.

The rarefying had a greater impact on richness and more samples have lost one to max. seven taxa.

The richness in the 'low richness'-samples are unchanged for both normalization methods.

I am not convinced that one method is fare better than the other.... But maybe one should simply NOT report richness for this type of data (even though I would really like to), but instead look at some diversity indices - as Haitao Wang recommended me just now :)

You achieved nice results with both methods. It seems like your samples aren't that diverse too, thus, normalising to 1,500 read counts isn't such an issue in your case. In other words, 1,500 read counts/sample are apparently sufficient to detect almost all species in your samples. I agree with you regarding the reporting of species richness because in some cases, species richness is a function of the sequencing depth.

Much success with your analysis! :)

Yao Xia

Haitao Wang I met this same problem that rarefaction throw many rare taxa, so I proposed an "Average Rarefied Table". You can use it in QIIME2, https://library.qiime2.org/plugins/q2-repeat-rarefy/33/

It's better to do rarefaction, check the following paper: Preprint To rarefy or not to rarefy: Enhancing microbial community an...