On phylogenomics datasets, what is a good threshold of missing data for species tree reconstruction? I was thinking on using the following thresholds:
1) For a particular species, exclude a loci if this has are more than 50% of Ns (missing data) compared to other species
2) Include loci present in at least 50% of the total number of species
I guess there is not a right answer to this question and many circumstances need to be considered. However, I wanna know what are your thoughts on this. Thanks in advance.