Has anyone applied biclustering to a 16s rRNA OTU table in order to correlate it with environmental data?

I think it should work. But before applying biclustering, you need to add the richness of each OTU ( reads number), equivalent to gene expression level, to your OTU table. Taxonomy information is not necessary in analysis. Nonetheless, you can rename your OTUs by adding abbreviation of their taxonomy to the new OTU names.

Cheers,

Fang

Joana Séneca

I shall try that and see if it works! Thank you!

Joana Séneca

I renamed my OTUs by adding abbreviations as you suggested, and right now the matrix set up is: rows = OTUS and columns = samples location.

Where should I add the environmental data?

Fang Liu

you can add the environmental data in columns. column 1 should be the richness of OTUs and column 2...n should be the environmental data &location (a.k.a metadata)

Thomas SB Schmidt

Dear Joana,

have you seen this very recent paper:

Zhu, Jiang & Chen; Constructing a Boolean implication network to study the interactions between environmental factors and OTUs; Quantitative Biology (2015)

Although I am not fully convinced of their approach, I think that this study can give you a good starting point on how to deal with OTU environmental associations.

Best,

Sebastian

http://link.springer.com/article/10.1007%2Fs40484-014-0037-3

Karel Sedlar

Hi Joana,

if you want to correlate OTU table with enviromental data, you have to correlate these tables first and then you can use biclustering on matrix of correlation coefficients.

So, you have one matrix of OTUs, where each row corresponds to OTU and each column to location (numbers in matrix show an amount of OTU in location). Second matrix is the matrix of environmental data, where each row corresponds to enviromental parameter and each column to location (numbers in matrix show a value of parameter in location). Is is accurate?

Then, you can correlate these table using Spearman's correlation (Pearson's correlation can't be used, since using various values with non-normal distribution in each table). Resulting table of correlation coefficients can be visualised as a heatmap and biclustering algorithm can be applied.

We've used similar technique several times, for example for correlating microbiota composition of carriage water of ornamental fish (OTU table) with antibiotic resistance genes (expresion of genes in different locations was the second table).

Matlab or R can handle these computations very efficiently.

Article Characterization of Microbiota Composition and Presence of S...

Joana Séneca

Thank you all for your answers!

Thomas, I didn't know about that paper, because as you said, it is very recent, but I already downloaded. The if then premise seems appropriate in my case and i still have to further explore the subject.

Karel, thank you for your input. Yes, the way you described the matrices is the exact way I have my data! I'm not very familiar with Matlab and I have been using R to address all my statistics (for biclustering I've used package biclust and eisa on Biocondutor). In the paper you sent me, have you used Matlab for every graphic display?

Thanks again!

Karel Sedlar

Something is from Qiime, something from Matlab, and something from other tools. In other papers, we also utilized R. In R, I would recommend packages gdata and gplots, especially function heatmap.2, which is the best function for heatmap clustering I have found so far. R script can be something like:

library("gdata")

otu=read.xls("*.xlsx")

envir=read.xls("*.xls")

otu.name=otu$genus

envir.name=envir$X

coef=cor(t(otu),t(envir),method="spearman") #correlation

library("gplots")

my.dist

Joana Séneca

My apologies for the late reply! Karel, thank you so much for your feedback! I adapted some of your code and I ended up with some nice things! I ended up using package Heatplus (Bioconductor) because I needed to make some annotations on the sides of the heatmap, but I agree with you about heatmap.2!

By the way, have you heard of this? http://bioconductor.org/packages/devel/bioc/vignettes/ComplexHeatmap/inst/doc/ComplexHeatmap.html?utm_campaign=Data_Elixir_30&utm_medium=email&utm_source=Data%2BElixir

For those who use heatmaps on their work, it can be a good option. I haven't tested it yet because it is only available for the next R version.

Laith Al-Ani

Hi Joana

I think this reference helpful

https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=2ahUKEwiy-u_owp_fAhXlQhUIHUc5DZIQFjAEegQIBBAC&url=https%3A%2F%2Fwww.biorxiv.org%2Fcontent%2Fbiorxiv%2Fearly%2F2018%2F09%2F14%2F416073.full.pdf&usg=AOvVaw3Z-k6T1yw0HK4XnOItKSby

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Which Scopus Journal provides the most affordable fees?

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

Who will be moral responsible for the death of thousands of people in the event of an earthquake?