One of the most used methods is run more than 15 replicates per K with different random seeds and apply the protocol described in Evano et al. 2005 (http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2005.02553.x/full). The representation of the Delta K vs K should give you the most probable K.
One of the most used methods is run more than 15 replicates per K with different random seeds and apply the protocol described in Evano et al. 2005 (http://onlinelibrary.wiley.com/doi/10.1111/j.1365-294X.2005.02553.x/full). The representation of the Delta K vs K should give you the most probable K.
If you have to run multiple runs, go to Project>Start a Job. For long calculations I recommend this site, where you can log in with a scientific e-mail address and upload your data: https://www.bioportal.uio.no/
And for Evanno's Delta K, you can use Structure Harvester: http://taylor0.biology.ucla.edu/structureHarvester/#
Evanno's transformation is really usefull for assessing K. It's based on the second order derivation on the variance of the maximum likelihood estimation of your model given a specific K. Therefore you must run several simulations for each K, at least 10 (I usually get 20). If computations are to slow on your computer, you can use High Computing facilities that are available worlwide in several institution (e.g. Bioportal in Oslo, like suggested Charalambos Neophytou). There is also a R script that can be used to make the Evanno's transformation (Structure sum, that you will find here: http://uit.no/ansatte/organisasjon/ansatte/person?p_document_id=41186&p_dimension_id=).
However, please be aware that this will only help you find the UPPERMOST value of K that describe best your data, you'll have to be very carefull in your model definition. You might also want to re-run Structure in the different population assumed, see this paper: Global analysis of Coffea canephora Pierre ex Froehner (Rubiaceae) from the Guineo-Congolese region reveals impacts from climatic refuges and migration effects, Cubry et al, GRES, 2013). Finally, you might want to cross-check your results with a model-free method, I suggest the one described by Jombard et al, DAPC, implemented in adegenet for R h ttp://adegenet.r-forge.r-project.org/). This method allows to find the best "K-like" value that describe your data prior to make a factorial analysis that maximise discrimination between groups. If you have geographic coordinates for your sample, you can also use SPCA, (Spatial Principal Coordinate Analysis), also implemented in adegenet.
Indeed, Delta K of Evanno et al. (2005) shows only the uppermost clustering level, not necessarily the actual number of subpopulations. I worked a lot with three oak species in Central Europe; Quercus robur, petraea and pubescens. Quercus petraea and Q. pubescens are more related with each other. Delta K is maximized for K=2 and Q. petraea clusters together with Q. pubescens. In such case, Evanno et al. (2005) recommend to run analyses within clusters. In addition, unbalanced sample sizes may lead to further errors (see Kalinowski 2011 in Heredity).
I have good experience with BAPS (Corander and Marttinen 2006; Corander, Marttinen, et al. 2008), which seems to be less prone to such clustering error. Moreover, BAPS always found "optimal partition" for K=3 in my case with oaks, correctly identying the number of species, and it showed no clustering problems due to unbalanced sampling (=some species are represented by much more/less individuals in the sample). However, keep in mind that BAPS may undersestimate the number of introgressed/intermediate individuals.
Another alternative for Bayesian clustering is GENELAND, with which, however, I haven't made good experience yet...
And keep in mind that Structure and all other bayesian methods are model-based, with strong priors and hypotheses. You must have in mind all the limitations and restriction of these approaches to correctly analyse the results. For example, one of the strong hypothesis in Structure is populations at HW equilibrium and no linkage between markers (you can however relax some of these hypotheses). So it might be interesting to cross check the outputs from such analysis with distance-based (usually factorial analysis) methods that makes no assumptions on your data.
When constructing your model and chosing the parameters, you will need to know a little about the biology of your species, Pritchard et al 2000 in the original paper, Falush et al 2003 and Hubisz et al 2009 well described the model parameters and how to tune it giving your species/sample.
Also keep in mind the Evanno method tends to pull out major breaks in genetic structure. If you have hierarchical structure - one level of structure with significant substructure - you will very likely miss substructure. Depending on 1) your research question and 2) scale of your study/question, you may wish to continue using structure within identified genetic groups to identify further substructure.
Hello, i want to ask, for Log likelihood function plot that is generated using STRUCTURE HARVEST the values on the vertical ax are logarithmic scale or the original values..which one is it?