We're working on the whole ITS gene of a cosmopolitan species. So we need to analyse it and see whether the population is structured or not structured, base on different geographical localities. We have data on molecular variance as well, but the differences is small.
it is a tab delimited text file (you can creat it in excel)
the first column is the sample name, the second column is the group of that population indicated by number (from 1 to infinite) and from the third column to the end you have to put each individuals genotypes for each marker (same as genepop). No data is indicated with -9.
The population group (2nd column) is a number which you can use only for your reference or you can make all the analysis taking into account those pre-defined populations.
Do not hesitate to contact me if you have any problem.
Hi Montes, from my reading, SNP alleles A, T, G, and C are coded respectively as 1, 2, 3 and 4. But what about gaps? We will treat the gaps as missing data as well, with -9? Much thanks.
Hi, in the manual is clearly stated that "The structure model assumes that loci are independent within populations (i.e., not in LD within populations). This assumption is likely to be violated for sequence data, or data from non- recombining regions such as Y chromosome or mtDNA." So you should not take the leap and use it for sequence data. You may instead want to use Geneland (http://www2.imm.dtu.dk/~gigu/Geneland/), for instance.
You could use STRUCTURE too, but you should either specify the recombination rate between pairs oof loci (which doe not seem to me the sensible option) or to introduce the data as a single locus with a code for each unique haplotype.
There is also a method based on AMOVA to do clustering analysis developed by Patrick Meirmans: http://www.patrickmeirmans.com/software/GenoDive.html
In LD situations I have used STRUCTURE indicating the generated haplotypes. For this, you should, first, do a LD test and generate haplotypes (you can do it with PHASE software). Then, create a new loci (haplotype) and give each individual the genotype for that haplotype.
Then, you will be able to perform the STRUCTURE analysis.
Hi Rita, thanks for the guides. I found out that we can use GenAIEx 6.5 to convert our data files into tab-delimited format, which can then be used in STRUCTURE analysis.
Hi Miguel and Iratxe, thanks for the help, I can now perform the analysis with STRUCTURE, thou still half way thru.
I think that Rita and following comments of Miguel, that you can't use Structure for sequences. Even they mention in the guideline that "it is not recommended".
I think that it depends to what you compare. If you want to estimate gene flows between populations (playing on haplotype frequencies) may be Structure will work using DNA, but not to separate/discriminate taxa (see the comment of Rita).
Anyway, as we've got many tools available with DNA why to use Structure which is surely not the best adapted.
Which stage do you mean here? Normally after the simulations of K you can use Structure Harvester (http://taylor0.biology.ucla.edu/structureHarvester/) to harvest your data set and directly get your best K, as well as data files needed for CLUMPP analysis.
Hi Legal, I actually tried to look into a cosmopolitan species of marine phytoplankton and study their population structures. Not to discriminate the taxa, as they belongs to only one species, and I am interested on how's their grouping by using STRUCTURE. So I guess it should be fine in this context?
Ok in this context you may use it, but depending on level of polymorphism also a median algorithm network (Bandelt, 1999) done with the software "Network" which is really adapted to treat sequences.
Yes I also analyzed the data set with Median Joining Network to look into the haplotype grouping and they actually grouped nicely, in accordance to the phylogenetic tree. I find that the Network software is way more user-friendly when compared to STRUCTURE itself though.
Actually, I tried to identify the phylogenetic relationship among different species. For some references such as Thiele et al. (2013) in MPE, they used Structure to identify phylogenetic unit based three nuclear loci and one mtDNA fragment. Personally, I prefer Network than Structure. For nextwork, as you know, we have SNP for nuclear gene sequences without cloning, should I still need to generate haplotypes with PHASE software or not? or I just use the sequences generated by PCR?
If I need PHASE software, I checked the manual. The output provides probability instead of haplotypes, any suggestions on how to generate haplotypes utilizing PHASE?
It depends how many samples you've got. If you've got nice amounts of repeats (among species) to use Phase to generate potential haplotypes is the right strategy. Then you can compare species on the base of their respective polymorphism.
But if you've got only one SNP/ species....better to use directly the sequence.
@Runhua You can try xmfa2struct, it is a program which converts sequence data from haploid organisms into the input file format of structure. http://www.xavierdidelot.xtreemhost.com/clonalframe.htm
I am using Structure software. so I would like to know how to identify the K value. when I was analyzing my data in structure software and analyzing the Ln P (D) value is zero. I don't how to alternative method for estimating K value.
You can used DAPC because it provides a robust alternative to Bayesian clustering methods like STRUCTURE that should not be used for clonal or partially clonal populations.
I have a question about input file preparation for Structure 2.3.4 software. I have SNP diploid data in hapmap format. I have converted the alleles as follows:
A=1
T=2
C=3
G=4
Mission=-9
Heterozygous=?????
Can anyone tell me, what number should be assigned for heterozygous alleles?