A phylogeny is only as good as the alignment it is built on.
When comparing generated sequence data with publicly available data (ie NCBI) there seems to be a variety of problems that can arise.
Even if one were to use identical primer sets as used to amplify available comparative data I envisage an issue.
The annealing sites of sequences at either end are commonly problematic with poor resolution, the only way to deal with such erroneous base pairs is to remove them?
Furthermore, sometimes, to ensure that you can compare data across a wide range of generated sequence data you may be required to reduce the alignment length down to have the largest comparable coverage. Especially as your alignment needs to have equal length for accurate phylogenetics analysis.
Surely, this alone is significant factor. If regions of high diversity happen to be those removed in the adjustment of the alignment then this could give a skewed representation of of genetic diversity and inaccurate phylogeny especially higher up the taxonomic stratification.
Would be interested to hear your comments, criticisms and general opinion on this issue and the above listed hurdles.
Cheers,