Hi everyone,
I am trying to build a phylogenetic tree with a few hundred 16S sequences of different sizes. About half my dataset is around 700 bp, the rest is complete sequences (around 1500 bp).
Obviously, when I align these sequences, a lot of positions are going to consist in gaps for half of my sequences.
I don't know what is going to hurt the quality of the final tree more : leaving these gaps, which don't really provide information and can lead to "wrong" clusters, or removing these positions, which comes down to removing information for the sequences that did have a base.
I guess another way to ask this question is to ask : is it a bad idea to try and make a tree with sequences of different lengths? Is there a way around these technical issues?
Thanks a lot for your help,
Marine