Raw ddRad-Seq data must contain pieces of mtDNA sequences, but I did not come across to any resources showing how to fish for mtDNA sequences. Do you know anyone doing this? or do you know any softwares/packages that would help?
I'm lucky enough to have a reference chloroplast genome for the organism I'm working on so have been able to pull out this data from my ddRADseq reads. If there's one available for your study organism (or a closely related organism) then this is the way to go. There's probably an appropriate bioinformatics pipeline that would be able to do this for your whole data set for you (I'm just starting out so am not sure!), but otherwise importing the reference into a program like Geneious or CLC Genomics and mapping a portion of your reads might be a good start. Be aware that you may also pull out reads from nuclear copies of some of the mtDNA genes which can complicate the interpretation of the data.
We have used RNA miseq read to map to a reference genome, if you have a reference genome you can probably align the mtDNA present to the reference genome.
The simply rational as to why there are no programs for fishing out mtDNA from ddRAD data is that there are almost no mtDNA loci in data for organisms that have a large genome size (which makes up most if not all ddRAD projects). For example, a human nuclear genome is ~200,000 times larger than its mitochondrial genome, thus if you assume that restriction sites are randomly distributed then you would expect one out of every 200,000 loci would be a mtDNA locus. Thus, the probability that of getting even one mtDNA fragment in a ddRAD data set is rare.
If you did want to see if there were any mtDNA loci in your ddRAD data you could run your data there pyRAD which creates a .loci file for each RAD locus. You could then blast each of these loci files against a blast database of mtDNA and see if any match. You may get a few matches, however it is far more likely that any matches you do get are actually nuclear copies of the mitochondrial genes than actually mitochondrial fragments.
You are absolutely right about RAD-seq data. Since RAD-seq uses a random shear step and no size selection you should get all fragments next to a restriction site in both mtDNA and nDNA. You can also do paired end sequencing end assemble rather large contigs and thus may be able to even assemble a mtDNA genome from a pair-end RAD-seq data set. I did not think about that before.
However, for ddRAD there is no random shear step and there is a size selection to reduce the number of loci in a library. Thus, most ddRAD libraries have fewer loci with higher coverage or more samples (although this is not always the case). This is the benefit of ddRAD, it is cheaper since there is no random shearing and you can reduce your library down to a much smaller number of loci, which allows you to pool more samples and still get high coverage. Thus, since you are only getting a subset of the total loci with ddRAD I would expect the probability of getting mtDNA loci out of the subset to be fairly low (although this will increase as you increase your size selection which will increase the total number of loci).
You could increase the size selection (i.e. 200bps-1000bps) for a ddRAD library and increase the sequencing effort and get more mtDNA loci, but this seems odd to do with ddRAD. However, I may be wrong. It would be interesting to see if you could do the same mapping for a ddRAD data set and see what you get. If you do get mtDNA loci from a ddRAD data set I think that Gozde would be very appreciative if you described the tools you used to do so.
I'd agree with the conclusion from the discussion between Max and Arthur above - because of the very small size of mtDNA genomes relative to nuclear genomes, and because the size selection step in RADseq protocols actually samples a small proportion of the total DNA in the library, it would seem to me that a RADseq approach would be a fairly inefficient way to get mtDNA sequence data.
It is hard. But it is not impossible (I just retrieved 549 reads, not a lot, but I was able to assign mtDNA haplogroup). For vertebrates, many archival tissues are muscle, i.e. pectoral tissue for birds. These are mtDNA enriched. When you do whole genome shotgun sequencing, we have found that up to 4% of the reads are mtDNA. Get a good reference genome for your mtDNA, and then use bbmap bbsplit. Then map in something like Geneious. Works well.
Out of curiosity was the 549 reads from ddRAD data and if so how many loci did you get and what is the genome size of the organism? Just interested to see how theory matches up with practice.
As Matthew stated it is not impossible but you don't get very many. For most of my ddRAD data, I went back and looked at a few individuals to see if there were any mtDNA reads in there. I typically found only 1-3 of the raw reads out of ~ million reads (pre-processed) per sample could be mapped to the reference mitogenome. After filtering and applying thresholds, our final ddRAD datasets usually consists of around 4000 loci. Genome size is about 1.2Gb.