I'm supplementing an already published genetic database with genes extracted from full mitochondrial genomes from genbank. I have all accession numbers for the complete genomes and I have previous gene sequences too.

I'm using R to do all the downloading and data management but i'm willing to use other open access tools to do the job. So far, my plan is to just download the full genome, replicate it in separate fasta files for each mitochondrial gene, align all the sequences for that specific gene and then trim sequences manually. I could also go genome by genome extracting the genes manually on genbank, but I figure there is a smarter and quicker way to do it.

I have never done anything like this, so I'm playing by ear here. Any tips are welcome.

Similar questions and discussions