I got my fungi sequences back from Illumina Miseq sequencing and I'm wondering how many reads do you use as a cutoff? ie. do you delete anything with less than 100 reads? I can't seem to find any literature that describes cutoff values.
I am currently working on an experiment with reads pair ends (sequencing Illumina kit Nextera 600 cycles). After removing adapters and primers, I was left with sequences of ~ 250 x 250 bp pair-ends. The trimming and filtering step is crucial for assembling the sequences.
To have high-quality sequences, the Q score ≥ 30. This must be the parameter for cutting your sequences. The Q score will define the TruncLen = c (250, 250). Also, pay attention to the maximum number of “expected errors” allowed in a read (maxEE).
The DADA2 pipeline recommendation is:
Considerations for your own data: The standard filtering parameters are starting points, not set in stone. If you want to speed up downstream computation, consider tightening maxEE. If too few reads are passing the filter, consider relaxing maxEE, perhaps especially on the reverse reads (eg. MaxEE = c (2,5)), and reducing the truncLen to remove low quality tails. Remember though, when choosing truncLen for paired-end reads you must maintain overlap after truncation in order to merge them later.
I am currently working on an experiment with reads pair ends (sequencing Illumina kit Nextera 600 cycles). After removing adapters and primers, I was left with sequences of ~ 250 x 250 bp pair-ends. The trimming and filtering step is crucial for assembling the sequences.
To have high-quality sequences, the Q score ≥ 30. This must be the parameter for cutting your sequences. The Q score will define the TruncLen = c (250, 250). Also, pay attention to the maximum number of “expected errors” allowed in a read (maxEE).
The DADA2 pipeline recommendation is:
Considerations for your own data: The standard filtering parameters are starting points, not set in stone. If you want to speed up downstream computation, consider tightening maxEE. If too few reads are passing the filter, consider relaxing maxEE, perhaps especially on the reverse reads (eg. MaxEE = c (2,5)), and reducing the truncLen to remove low quality tails. Remember though, when choosing truncLen for paired-end reads you must maintain overlap after truncation in order to merge them later.
For which step? You could not find any literature because probably it does not exists. And based on the quality of your data, different number of reads will be filtered in intermediate steps. As long as your reads pass the filtering criteria, it would be fine. For samples which do not contain any reads could simply be removed at the visualization steps.
As mentioned in post above, the parameters are too high that if you use those parameters, most of your data will be thrown out as trash. In MiSeq data, reverse reads always have bad quality as compared to forward reads and should be trimmed according to the quality plots generated before the filtering and triming step.
Quality threshold of 2 (score 20) would be a good filter and reverse read length should be according to the quality plots and generally less than the forward reads. The length parameters can be chose variably as long both reads overlap. And there is not universal or general parameter for that. These all things are directly dependent on your data.