Animal population genetics seems to have gone from medium throughput manual screening of large numbers of markers like microsatellites to jumping on the NGS bandwagon. The data is useful for identifying new markers like SNPs but I'm worried that we're now moving towards a model of very high quality data from very few individuals. That's not good enough for addressing biological questions at the population level.
I'm also concerned that biologists are too far removed from the data & overly reliant on bioinformaticians. We seem to have skipped over high throughput screening of individuals for genes of adaptive importance, something which was a promising area just five years ago.
A smart and timely question. My short answer would be another question.. Which parts of population genetics are suited to NGS data ?
A remarkable recent discovery is a way of solving the coalescent for long sequences taken from pairs of individuals (Loske, Harrison and Barton)… the beauty is the estimate is taken over all the histories of the ancestral recombination network – so small sample size (2!) can be compensated for by increasing total sequence length.
A linked and recombinant history often voids the assumptions of standard popGen approaches – but new tools are likely to use the power of recombination in the data.
So my answer to your original question (Are Next Generation Sequencing approaches suitable for Population Genetics?) is: They are going to be very soon.
Over the past decade there has been a period of very rapid change driven by the new data types becoming available. The guys driving this are focused on how to do good inference from the data – when I say good inference I mean model-based, and those models naturally come out of population genetics.
I think it is this pace of change that has led to the problems you highlight. “biologists are too far removed from the data & overly reliant on bioinformaticians”
Far removed – yes. Many biologists are not trained in using the types of databases necessary to grasp NextGen data. But that would be fine if the bioinformaticians they were working with were at the top of their game. Unfortunately just as bioinformaticians started to be hired by biologists, the very basis of how the data should be used underwent a revolution. It would be a remarkable bioinformatician who mastered the necessary databasing, was able to make it accessible to the biologists in an interactive way AND at the same time keep up with the inference literature regarding how to deal with uncertainty in estimates (irrespective of whether it be an estimate of sequence state, or a high-level population genetic parameter). The current standard bioinformatics approach to uncertainty is to set an (arbitrary) threshold, trash all the data below the threshold, and treat the data above the threshold as if it is absolute truth to be passed onto another inference step. We can now do much better than that, and must, if we are to make the best of the NGS data now becoming available. But I am not sure it is the bioinformatician’s job to follow the literature on new approaches to inference and new techniques to apply sound, well-explored popGen theory to the new types of data available. Tracking that literature is, in fact, a hard call for many biologists. It is only very recently that people are starting to catch on that we can’t just throw inference questions at the bioinformatician on top of the databasing and access issues of dealing with NGS data.
So, I would say we have been slow to react – or perhaps the rate of change is faster than we could cope with – leaving the period of confusion you refer to. But I think things are going to improve, especially if we have a period of no game-changing technological advances for a while – this would give the inference guys time to update the tools and us the time to evaluate them. On the other hand: would I turn down another game-changing NGS advance? No. We live in interesting times.
Lohse, K., Harrison, R.J., and Barton, N.H. (2011). A General Method for Calculating Likelihoods Under the Coalescent Process. Genetics 189, 977-U398.
Hello Brian,
at the moment next generation sequencing is a very attractive field for research diseases. So many people want to screen big cohorts with these technique to look, as you say, for SNPs and mutations. As I read your posting I think that NGS will use for population genetics in in the past, but you write that this is not the case (?)
There are promising developments in next-generation markers for population genetics. I'm a bench biologist but did some RAD experiments to genotype some populations in what ended up being ~18k snps. The process has a steep learning curve, particularly when you are studying non-model systems lacking a reference genome.
If your system has a reference genome, you can consider target enrichment approaches for next gen sequencing. Something similar to Molecular inversion probes (MIPs) can be designed to sequence several kbs to Mbs of the genome fairly quickly when you have a reference genome withe genes of particular interest for your population genetic study.
If NGS allow to produce high quality SNP data on several populations then coalescent methods will be very valuable. The main problem is the quality of NGS data, and the price allowing or not to scan genome regions for SNPs on good sample sizes.
Population Genomics is the answer. If you involve greater parts of the genomes, next generation sequencing will definitely prove really handy. Of course massive sequencing will increase the noise resulting from mis-sequenced residues but then again a refinement of the data available will surely increase the analysis definition. I can assure you that this is already happening in plants. You can also check the review from Davey et al. 2011 in Nature (Genome-wide genetic marker discovery and genotyping using next-generation sequencing). If you ask me I prefer the traditional methods that involve more data than the 'sterile' sequences. Then again time will prove.
It is the way to go. As the NGS technology matures and cost drops, we'll see more studies in population genetics using NGS.
The field seems to be rapidly developing; currently, methods like RAD sequencing, ddRAD (https://www.wiki.ed.ac.uk/display/RADSequencing/Home) and genotyping by sequencing (GBS - http://www.maizegenetics.net/Table/Genotyping-by-Sequencing/) are beginning to exploit NGS throughput to genotype larger numbers of samples. Polymorphisms in genes of adaptive importance can be discovered by transcriptome sequencing; panels of high quality SNPs can then be selected for genotyping using medium and high throughput SNP genotyping platforms (Open Array, etc..). Current NGS throughputs would in theory allow to genotype 1000 samples at 1000 loci in a single run, but new methods for constructing reduced representation genomic libraries and the relative analysis pipelines still need to be developed to achieve this goal.
Some very interesting contributions. As Grigorios said, population genomics is the future but I'm not convinced it's the present. Particularly if you're a small to medium sized group with a budget in the middle of a recession. My PhD was done in a lab which had medium throughput manual screening of microsatellites in salmonids (amongst other species) down to an art. The emphasis was on interesting case studies using existing molecular tools and there's plenty of those for any biologist.
I was more recently part of a group doing some nice bacterial genomics. It requires a good deal of expertise starting out and investment in time and money in the required infrastructure to be doing genomics at a credible level. Even then, coming from my background, I heard alarm bells when I heard things like comparing "groups" where one group consisted of a couple of genomes. This is not credible.
The usual pattern over decades is for tools developed in the areas of medicine to filter down to microbiology/food science sectors and then to Zoologists/Ecologists in time as the techniques became more routine and cost-effective. I think the latter group are jumping the gun a little trying to get on the genomics wave (that's if the genomics wave doesn't get superseded by an epigenetics wave....). Can we credibly do >100 individuals from 10+ populations right now with genomics? That was the standard for a good study in the middle of the last decade. I suspect we can't. I also suspect that even if we could, there isn't the specialist bioinformatic support to credibly interpret the data.
Why are we rushing to do genomics on familiar species in a fatally limited way before we know even the basics about population genetics in other species of conservation interest. Problems which could be addressed with simpler molecular tools.
It is absolutely possible to do >100 individuals from 10+ populations at the moment with population genomics; however, it depends on i)the size of genome you are investigating; ii) the number of individuals per population you are desiring to analyze; iii)the desired number of loci to analyze; iv) and the desired depth of coverage in sequencing; v) your budget; vi) the computational infrastructure you have to analyze the data.
Illumina MiSeq and HiSeq runs are incredibly affordable (~$1500 for a MiSeq run/ ~$1000 for a lane on a HiSeq 2000 or 2500) for almost any lab to run their genomic samples on. Genomic libraries for population genomics follow a similar protocol that AFLPs use, and provide much more data. As with any experiment, it takes careful design and some initial investment to start and be successfully executed. Perhaps population genomics is overkill for the question a group is investigating;however, some species/genera appear genetically monomorphic and really do require looking for variation throughout the genome.
As a recently graduate doctoral student, the money we spent on NGS population genomics was surely on par with or possibly less than what we spent on AFLPs, developing new microsatellite markers, and sequence based cpDNA haplotypes for a non-model plant species. I would recommend the Stacks package from the Cresko lab out of University of Oregon. It runs on desktop Linux machines or a virtual machine on your Windows PC if that's an issue. Here's their page: http://creskolab.uoregon.edu/stacks/
This really does depend on the question - if you have a nice annotated reference genome, good evidence of local adaptation and are willing to look into the data with some depth (e.g. carrying out coalescence based population genetics as opposed to simply looking for Fst outliers) then the rewards must be there in the right systems. There's a nice paper in the current PLoS Biology by Jonathan Losos, Craig Moritz and colleagues which makes the case quite well, but also talks about the challenges... If the science is good, it will be worth it IMHO
Hello everyone, the question of the issue is not the availability of the technique for population studies ami what I find is that we have to develop analytical tools and computational statistics that allow us to do analysis of genetic variability with dozens of loci, STD's. That's my humble opinion.
As with any study, the value of the method will be determined by the original research question. I am currently using SNPs developed from NGS data for a non-model species. I am very excited about the possibility of using a reference genome to start selecting specific SNPs associated with ecologically important function, that I will be able to use in functional molecular ecology studies (eco-evol / community genetics)
I think it all depends on the questions you want to answer. I do think that the current NGS trend seems a bit aimless at times and and, there data is being produced faster than we can analyse it. Genomes and genome history is very complex, and we could be missing a lot of interesting things that get buried by a lot of confounding data.
NGS of non model organisms, even if poor in quality, could be very useful to develop more specific markers, or identify interesting genes. I don't think it sensible to do landscape population genetics, or mating system studies with genomic data, but having a whole genome, or bits of it, would be very useful to find microsatellite loci, for example.
However, I am very critical to the thought that NGS is the answer to every question, or that more traditional population genetic studies are inferior in quality just because they don't use information from thousands of SNPs. It is certainly not impossible to have a lot of information and do a poor analysis of it.
NGS technology can be used for targeted resequencing of 100s of individuals to answer many population genetic questions. It does however require a reference genome, some infrastructure and individuals familiar with the technology and data. It is not just plug-n-go.
Here are a few relevant papers: (1) Meyer and Kircher, Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harbor Protocols, June 2010. (2) Gunnarsdóttir, Stoneking et al, High-throughput sequencing of complete human mtDNA genomes from the Philippines. Genome Research 2011. (3) Barbieri, Whitten, Pakendorf et al, Contrasting Maternal and Paternal Histories in the Linguistic Context of Burkina Faso, MBE 2012 (4) Maricic T, Whitten M, Pa ̈a ̈bo S. 2010. Multiplexed DNA sequence capture of mitochondrial genomes using PCR products. PLoS One 5(11):e14004.
A smart and timely question. My short answer would be another question.. Which parts of population genetics are suited to NGS data ?
A remarkable recent discovery is a way of solving the coalescent for long sequences taken from pairs of individuals (Loske, Harrison and Barton)… the beauty is the estimate is taken over all the histories of the ancestral recombination network – so small sample size (2!) can be compensated for by increasing total sequence length.
A linked and recombinant history often voids the assumptions of standard popGen approaches – but new tools are likely to use the power of recombination in the data.
So my answer to your original question (Are Next Generation Sequencing approaches suitable for Population Genetics?) is: They are going to be very soon.
Over the past decade there has been a period of very rapid change driven by the new data types becoming available. The guys driving this are focused on how to do good inference from the data – when I say good inference I mean model-based, and those models naturally come out of population genetics.
I think it is this pace of change that has led to the problems you highlight. “biologists are too far removed from the data & overly reliant on bioinformaticians”
Far removed – yes. Many biologists are not trained in using the types of databases necessary to grasp NextGen data. But that would be fine if the bioinformaticians they were working with were at the top of their game. Unfortunately just as bioinformaticians started to be hired by biologists, the very basis of how the data should be used underwent a revolution. It would be a remarkable bioinformatician who mastered the necessary databasing, was able to make it accessible to the biologists in an interactive way AND at the same time keep up with the inference literature regarding how to deal with uncertainty in estimates (irrespective of whether it be an estimate of sequence state, or a high-level population genetic parameter). The current standard bioinformatics approach to uncertainty is to set an (arbitrary) threshold, trash all the data below the threshold, and treat the data above the threshold as if it is absolute truth to be passed onto another inference step. We can now do much better than that, and must, if we are to make the best of the NGS data now becoming available. But I am not sure it is the bioinformatician’s job to follow the literature on new approaches to inference and new techniques to apply sound, well-explored popGen theory to the new types of data available. Tracking that literature is, in fact, a hard call for many biologists. It is only very recently that people are starting to catch on that we can’t just throw inference questions at the bioinformatician on top of the databasing and access issues of dealing with NGS data.
So, I would say we have been slow to react – or perhaps the rate of change is faster than we could cope with – leaving the period of confusion you refer to. But I think things are going to improve, especially if we have a period of no game-changing technological advances for a while – this would give the inference guys time to update the tools and us the time to evaluate them. On the other hand: would I turn down another game-changing NGS advance? No. We live in interesting times.
Lohse, K., Harrison, R.J., and Barton, N.H. (2011). A General Method for Calculating Likelihoods Under the Coalescent Process. Genetics 189, 977-U398.
I'm going to segue to a recent interview with the Director Bob Zemeckis in which he voiced a few concerns about the film industry that I think have some relevance here. Zemeckis was pondering how come a generation of young directors hadn't come through and displaced his generation from the 70s. His theory was that the pace of technological change meant that there had been no time for these young directors to absorb one innovation before another came along. Without that, they were unable to master and incorporate each new wave into their art.
I think we will look back on this era in genomics as a revolution rather than an evolution (sorry) in the molecular tools at our disposal and see that it took another decade to allow biologists catch breath and absorb what was now available.
P.S. It would help if we started teaching every Biology student how to program immediately.
I agree with that the concern that it is easily possible to get "too far from the data" as Brian has pointed out, and also absolutely agree that it is possible to continue to do good science and answer interesting questions without joining the NGS bandwagon. There is a review paper to this effect entitled “Phylogeography unplugged” by Bowen et al. coming out soon in Bull. Mar. Sci. However, I also have to agree with Nabeeh’s response that it already has gotten to the point where, in fact, it is often cheaper to use NGS than it is to perform these studies than traditional Sanger sequencing. Everything is a trade-off in reality – we don’t have infinite time or funding to do every study with the sample size, genomic coverage and depth that we’d probably like. If data collection were free, we’d all have much more data. But, the fact is that NGS has changed the playing field and the costs have dropped so much that it’s just cheaper to use it in many (but certainly not all) cases. Even if you are part of a small lab with medium throughput and manual screening, the costs have dropped to the point where you need to do the math. In this paper (http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0034241 ) we showed that the tipping point (if we didn’t consider labor) on the flat price of sequencing was about 3200 individual genotypes (20 individuals in 16 populations for 5 loci for each of 2 species). The cost with traditional Sanger sequencing in the medium throughput method was $25,725 whereas the cost on the 454 platform was $24,560, but the real difference was that a single person (J. Puritz) did the 454 work in only 6 months whereas it took 4 full-time people (3 grad students and a post-doc) to collect less data in the same period of time. So even if you are not convinced that we should all jump onto the RAD or population genomics wagon, you should still take a close look at the time and cost of doing your project with the methods you are using now as compared to on a NGS platform – you may be surprised how much faster and cheaper it is…
I would agree that NGS may contribute significantly to understanding relatedness/genetic exchange between populations - and may even allow to create some sort of evolutionary phylogeny. I fully understand the concern about knowing the actual organisms in the field, but I would anyways see bioinformatics as a tool and not as the science itself - but a very valuable one! We have tried a slightly different approach: Using the multilocus rDNA (in particular, the ITS-region) and defining each ITS-copy type as a haplotype, we compared two populations of Arabidopsis thaliana for mutation patterns. We first did 454 sequencing and then used the freeware "Network 4.6.1" (Fluxus)". Thereby we could create a haplotype network discriminating between ITS copies with mutations and combinations thereof present in either only one or both of the studied populations. Of course, in this pilot project we only compared two populations. But the method was very robust and we are going to test it at a larger scale. The advantage: You only have one marker (and thus only one primer set to construct), but because there so many copies of rDNA per nucleus, the information in there is huge! If your are interested: Simon et al. Mol Biol Evol 2012, Volume 29, Issue 9, 2231-2239.
Absolutely - RADseq - large number of SNPs on large numbers of samples. These guys are doing a great job of it. We are starting to get into it. Check it out
https://www.wiki.ed.ac.uk/display/RADSequencing/Home
It has been made clear already that it is possible (price-wise) to do population studies with NGS. However, we must not lose sight of what happens after the data are collected and assembled. My appreciation is that more emphasis is being made on developing pipelines to deal with the bulk information, and less on the conceptual framework to analyse that information. My point being, that not all bioimformaticians know enough population genetics to be able to squeeze the data properly. There are a lot of papers out there that rarely go beyond looking for Fst outliers and patterns of nucleotide diversity. Of course, it is always better to have whole genomes available, but we must not get dazzled with the new technology and leave the population genetics theory aside. Students should be familiar with bioinformatics, but they also need to have a solid bases on population genetics. Because, technology changes very fast, but the Wright-Fisher model is still as valid as ever.
Very interesting question and comments. As some of you point out whether or not NGS is useful would depend on the question. In my opinion, jumping into NGS without a careful experimental design, just for the sake of using new tools is a mistake. For example, using tones of markers in some cases is not going to improve the resolution to detect some population genetic processes. So doing power tests, will clearly help to take the right decision. However, while NGS is not magic it will clearly help us understanding processes that we can't right now, and will clearly open new questions. For that, developing genomic resources and statistical tools for analyzing this data is very important, and with the exception of some species, that's what is mainly being done.
Very interesting thread. Does anyone know of a published review on the topic? I agree with much of what have been said. Most importantly, the use of NGS depends on the type of the question. For many applications good old fashioned (sic!) microsatellite are as likely to do the trick as are hundreds and thousands of NGS generated SNPs. However, with new data comes new questions and some of the things we used to dream are now possible to address. The old question of which part of the genomes are important for adaptation, speciation and relevant in a conservation perspective is now possible to address (see e.g. Lamichhaney et al. PNAS 2012).
Another problem, apart from the bioinformatic one, is the costs involved. Not many ordinary research labs have the finacial support to do NGS (despite the fact that costs are dropping) and certainly not massive resuquencing.
Davey et al. 2011. Genome-wide genetic marker discovery and genotyping using next-generation sequencing. Nature Reviews Genetics 12, 499-510. There were issues of Molecular Ecology and the American Journal of Botany that were in part dedicated to next generation sequencing based population genomics (RAD, GBS, etc.) published in 2012.
I believe the cost of sequencing is not the issue. I developed tens of thousands of snps in ~100 individuals for ~$2000 USD around a few years ago using an Illumina GAIIx and a protocol developed by some collaborators that's quite similar to the RAD protocol. The real problem is in the computational side of things.
I would stress, all methods benefit from elegantly designed experiments with concise hypotheses. If you are trying to truly understand genome level adaptation and diversification in nonmodel systems, NGS population genomic methods offer the best insights into genomic processes. There is no comparison between AFLPs, SSR variation, and SNP nucleotide resolution recovered in NGS-based population genomics.
It's being done:
http://www.genetics.org/content/186/1/207.short
http://dx.plos.org/10.1371/journal.pone.0015925
NGS used for identifying species differences in flycatchers.
http://www.nature.com/doifinder/10.1038/nature11584
http://www.nature.com/hdy/journal/v110/n5/full/hdy2012111a.html?WT.ec_id=HDY-201305
Two samples of ten male flycatchers each and then talking about how 45% of the polymorphic sites where shared. While the authors acknowledge the problems the small sample sizes pose to detecting rare alleles, the commentary does not.
They also find much higher divergence at the Z chromosome but don't reference known lower effective population sizes at sex chromosomes (due to there being fewer copies thereof than autosomal chromosomes) or the effect of skewed mating patterns in reducing same (http://www.ncbi.nlm.nih.gov/pubmed/15166162).
They've only sampled ten Z chromosomes (as the males only have one Z chromosome each) rather than 20 for any autosomal chromosomes. I would again wager that this could bump up the possibility of stochastic increases in average divergence across the Z chromosome.
I wouldn't be at all surprised if a sex chromosome was involved in species divergence but this paper glossed over demographic factors which could go a long way towards explaining the level of divergence observed. More generally, there was no outstanding functional category of gene overrepresented in those showing higher divergence and no evidence of high dN/dS ratios (though these are a crude measure anyway over long segments).
It's fantastically detailed data but when the technology moves on and we start getting studies with decent sample sizes again, will we find that we end up dismissing a lot of current findings?
Any answer that can be answered by DNA study can be done by NGS
good to read all that information waooo
RADseq is the most promising approach. It can be highly multiplexed (96 sample for a single illumina run) but it coul be increased in the next future. Moreover, everyday new protocols are emerging for faster and cheaper analyses. Also bioinformatics is going on. STACKs, RADtools and also Freebayes are interesting suites to manage NGS data for population genetics matters.
https://www.wiki.ed.ac.uk/display/RADSequencing/Home
You can certainly multiplex more than 96samples in a single run using RADseq. We have just started using the ddRADseq protocol from Peterson et al (see link), where we can multiplex 576 samples in one run, due to the 48x12 barcoding strategy. Multiplexing ultimately comes down to genome complexity and coverage.
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0037135
Firstly "population genetics" is not a homogeneous or consistent method. It's a amalgam of different methods based on different theories, and everything depends on your question. Of course, NGS can be useful, and it is actually becoming cheaper to do an outsourced genotyping-by-sequencing analysis on 5000 SNP markers (even on non-model organisms) than to do your own analysis on 25 microsat loci or 200 AFLP loci. So your power to detect subtle patterns increases as you have more information (which does not mean that these patterns are important). One disadvantage of this type of data, is that the assumption that your markers are not linked physically on the genome (a prerequisite for many classical population genetics inferences) is likely to be violated. The good thing about having 5000 markers genotyped (for the price of a regular microsat analysis) is that you can still decide to only use 500 markers and to really replicate (instead of pseudoreplicate using bootstraps) your tests over independent sets of markers. One disadvantage of NGS is that it requires much higher quality and quantity of DNA, so if you're using non-invasive samples such as hairs, scat or swabs, you may not have such good DNA for NGS, but it will do fine for a microsat analysis.
Likewise, you can use NGS do first detect markers of your choice (here's an example http://dx.doi.org/10.1111/1755-0998.12039) and then to outsource your SNP-analyses for, say a few hundred markers, again, at the same price as a regular popgen analysis. The main advantage of SNP data is that they're easy to exchange across labs, unlike microsats or AFLP. An allele C is a C everywhere and so is an allele G, but microsat allele 112 in one lab may be called allele 114 in another lab. With AFLP, the problem is even larger. So using SNPs (and NGS helps in finding them) will help popgen for sure. One other main assumption of classical popgen theory is that there's no homoplasy: each mutation leads to a different allele. With microsats, you're less likely to violate this strongly than with SNPs, as there are more allelic states for microsats than for SNPs. If you have a genomic map and a lot of SNP markers, however, you can reconstruct unphased haplotypes and use these to perform your popgen analyses, overcoming both the problems of homoplasy and linked loci to a large extent.
Much more than using one type of data or another, it is to get your population genetics theory right, to know what the limitations of each type of data are, test your assumptions (equilibrium?, infinite allele model?, island model of migration? ...) and to be self-critical all the time. I still see lots and lots of papers in modest to really good journals which entirely forget about assumptions in their popgen statistics, and who use methods and test statistics in an entirely wrong way, and get away with it (because the referees don't bother about their own assumptions either). No way that a decent ecological journal would allow you to use a parametric test if you didn't mention that your data fit normality and are homoscedastic. In popgen, this kind of checks seems to be unnecessary a lot of the time. Wow, I ended quite off-topic!
I'd like to stress what Joachim already wrote: data sharing among lab is essential, and microsat are not affordable for that, since data comparison is really difficult to achieve.
Great answer Joachim.
Maybe to summarise, high quality data should mean high quality papers but that isn't happening just yet.
I totally agree with Joachim, and I guess we will still need few years to see the real application of NGS, but again the design of your experiment and o keep in mind the assumption behind the Pop gen is crucial.
This thread is extremely interesting, I was wondering if RAD-seq is useful in the application of identifying hybrids? I have seen a lot of research where it is being used for general population genetics such as diversity determination etc., but I am really interested if the RAD-markers can assist in identifying hybrids between subspecies.
@Sonja - I think RAD-seq can definitely be useful for finding hybrids. I have been using RAD-seq with several species of non-model insects and recently ran a combined analysis with pair of sister species. I ended up with far fewer markers than in either single species analysis because not as many tags align across species. But the results turned up some intermediate genotypes that appear to be hybrids.
@Sonja: basically, the principle of RAD-sequencing is in many ways similar to AFLP: it uses restriction-sites to identify polymorphisms across the genome. So if someone did it in the past with AFLP, you can probably do it with RAD-seq too. There are even phylogenetic studies that used AFLP, for example http://sysbio.oxfordjournals.org/content/59/5/548.abstract
Another approach that you can use next to RAD-seq is GBS : genotyping by sequencing. http://www.igd.cornell.edu/index.cfm/page/GBS.htm
It uses a very similar principle as RAD, but typically has less deep sequencing coverage per individual, and has much shorter reads surrounding each SNP (68 nt). This means that it is much more difficult to annotate GBS reads to genomes of other species. This disadvantage is compensated by it its much lower price per individual (comparable to the cost of an AFLP analysis but with ten times more markers). We've used GBS in black alder (Alnus glutinosa) and added a bunch of individuals of Grey alder (Alnus incana) as we suspected that some of our individuals were F1 hybrids. Ordinations of the genotypes based on genetic distances (such as PCoA and NMDS) indeed confirmed this. With this approach, we ended up with c. 2000 SNPs. In two other "population-genomic" GBS projects we had c. 5000 SNPs, where our goal was to identify subtle patterns of population structure. This just shows that you can use NGS techniques to identify hybrids among closely related species. If your subspecies are genetically clearly differentiated, RAD-seq or GBS will be very useful for your research question. Below are two papers that used GBS on black alder: https://www.researchgate.net/publication/262645659_Landscape_genomics_and_a_common_garden_trial_reveal_adaptive_differentiation_to_temperature_across_Europe_in_the_tree_species_Alnus_glutinosa?ev=prf_pub
https://www.researchgate.net/publication/263237719_An_evaluation_of_seed_zone_delineation_using_phenotypic_and_population_genomic_data_on_black_alder_%28Alnus_glutinosa%29?ev=prf_pub
Article Landscape genomics and a common garden trial reveal adaptive...
Article An evaluation of seed zone delineation using phenotypic and ...