Currently, my analysis of the fungal data related to Vanilla species generated from Illumnia sequencing.
I used QIIME and UCLUST in the bioinformatics analysis. After completing the bioinformatics analysis, there generated a lot of sequences and OTUs in the data.
My colleague suggested that I use an R package called phyloseq to create a phyloseq object to begin my analysis. I used another R package as well, called decontam to detect contamination and supposedly remove them out of the data.
I potentially have identified 755 OTUs as fungal sequence signals. I was wondering that if you have through your works encountered or used a method to filter out the "noise" in the data. I.e. have you ever set up an abundance threshold for OUTs with fewer than 10 or 100 reads?
Have you ever encountered an instance were by looking at the OTUs that were identified as fungal sequences, how confident would you be to use this to data analysis to identify the orchid mycorrhizal fungi (OMF)?
I was thinking of drawing a comparison between populations using regression .
As well as using a PCoA and NMDS for the analysis. Any and all tips or methodological advice you could share on how to move the data to a statistical test format would be greatly appreciate.
Thank you for your attention, time, and guidance.