I am studying diversity in the nosZ gene in marine bacterial communities. Sequencing was done using Illumina Miseq, and after some preliminary analysis (e.g. quality trimming using Pear) I now have a large fasta file containing over 500,000 nucleotide sequences. Before I move on with the analysis, I want to make sure that all sequences in this file are actually nosZ gene sequences. However, I am not sure how to proceed.
Any suggestions would be really appreciated.