I started to add recently the gap partition in my ML analysis of ribosomal genes because it improves the results. To perform my analysis I use the online version of RAXML at Cipres Gateway. I add the gap partition as binary data [1/0] with ascertainment_correction_lewis.
However in my last analysis, where few short sequences were present together with other longer, I obtained an unexpected result. The short sequences clustered together, even if not related, in the wrong position and they were showing very long branches. An other short sequence was in the correct position but also showed a very long branch. The latter sequence covered the ssu-its part, where most of the gaps where present, while the formers covered part of the lsu.
I deduced that the abnormal topology of the tree depended on the missing data in the gap partition, codified by '0' like the absence of gap. In other words the short sequences [not covering the its] had a big part of gap partition identical because of a long list of '0'.
There is a way to overcome this problem without trigging the sequences at the same lenght?