As already mentioned, it all depends on the data set that you are using. Cyanobacterial 16S rRNA gene sequences have a lot of problems in their own. So you need to be a bit careful (which unfortunately in itself is a bit tricky!). Also check things like
1. Are there wrongly oriented sequences in your dataset?
2. Did you do the model test using the lowest BIC scores?
3. Does the Tree topology change with all the algorithms (like ML, NJ, MP etc.)?
If you have used MEGA, there is model selection there. You can simply use the model with the lowest BIC Score.
It will be great to know that what group do your sequences belong too. Is it the heterocytous or the non-heterocytous ones? Frankly speaking, one of the biggest problems I faced was getting correctly identified sequences. Try to take published cyanobacteria reference sequences from published works from journals like IJSEM, Fottea, Phytotaxa etc. It may help!
1) As Xabier recommend, I'm agree with the use of an appropiate mode to align your sequences.
2) I think that for an partitioned model selection a good option is use JModelTest or better, Partition Finder.
3) Have you used and phylogenetic method to analyze your dataset? May be that an Maximum Likelihood or Bayesian approximation can increase the bootstrap (BS) values but, do not forget that is important be sure that the sequences aligment and the evolutionary model have been selected in the best way. If these both things are right, even if you perform an analysis in MEGA, the BS values may increase.
4) I really don't know nothing about cyanobacteria but, if the sampling is not the best, this may be an important reason that you obtain low BP values
Thank You Jovana M. Jasso-Martinez. I am trying all possible combinations.
Prashant Singh Sir
I am using species that belong to different orders including both heterocytous and the non-heterocytous (file attached). I took the sequences from NCBI RefSeq database (only for those which has been fully sequenced and are in Genome database)
Yes with this list there will be few problems. I will detail the heterocytous ones while someone working more with the non-heterocytous ones can pitch in too
1. Anabaena variabilis ATCC 29413 is a bit problematic. I have seen that it usually distorts the entire phylogeny when put in with diverse taxa. In your case, this can be true.
2. Many of the NIES strains do not actually cluster within the identified groups.
3. Maybe remove the Nostoc azollae 0708 sequence
4. Nostoc piscinale CENA21 does not fall into the Nostoc sensu stricto clade.
5. Nostocales cyanobacterium HT-58-2 can be removed from your phylogeny
6. Calothrix sp PCC 7507 and Rivularia sp. PCC 7116 may fall into the same clade which could be actually the Rivularia node (maybe better sampling could separate them but the Calothrix strain phylogenetically should be distant from the actual Clade).
So, the thing is that the taxon sampling for the heterocytous forms looks a bit problematic here. Also your dataset is very much diverse and maybe contributing to the lower bootstraps or maybe single lines or long branches.
I would be extremely careful with any name coming from NCBI and with any name given to Cyanobacteria in general, as they are known to not have a very good concordance between phylogeny and nomenclature.
I'd check in the GTDB website the proposed updated nomenclature (they use phylogenomics).
In addition, you can search those sequences in Silva. You will get them perfectly aligned to the Silva reference and you can get the classification to genus level with something way more reliable than NCBI for these matters.
Parva Sharma 16S has generally low bootstrap (or any other support) in deeper nodes. You'll have to add many more genes to significantly improve the node support. If all nodes have