I have a set of HIV pol sequence data, and after alignment it is clear that some sequences are much shorter than others – primarily due to sequencing issues. Because the gaps/missing data are not necessarily meaningful, I consider these a "complete deletion" when doing a maximum likelihood phylogenetic analysis. Doing this, however, reduces the number of positions that could be included in the analysis since only positions where there is data for all sequences can be used. I was wondering what the best approach to handle this would be. Is there a general lower limit for the number of bases that should be included for an informative viral phylogenetic analysis, and is it better to remove samples with shorter sequence lengths? Would a Bayesian approach using programs like BEAST overcome this? Thank you!

More Steven J. Clipman's questions See All
Similar questions and discussions