Actually, depending on the taxonomic breadth of your dataset, most of the information is in the third codon position (referring to protein coding genes). I have never heard of a study that would recommend to EXCLUDE this information categorically. Rather, usually you partition your dataset to allow for separate parameter estimates between conserved and highly variable sites. Also, there are two common misconceptions about third-codon positions and saturation:
1. Saturation = fast substitution rate = high likeliness of homoplasy. This applies to any fast evolving site in a gene locus, not just third codon positions. It is actually way more common in variable regions of non-coding gene loci, such as ITS1 and ITS2. The difference is that in protein genes, these homologous sites are identified by definition, whereas in non-coding genes it is often impossible to determine homology with certainty. But even it if can be determined, nobody would suggest to through out the entire ITS1 or ITS1 unless you are aligning across a very broad taxonomic range. So it always depends on the context.
2. Saturation is only a problem if it obscures broad-scale relationships. Deep relationships are more likely to show homoplasy. However, usually in your dataset deep relationships are anchored by more conserved regions, which means homoplastic sites will only affect the overall topology if their relative frequency is higher than that of conserved sites that contain deep-level synapomorphies. On the other hand, these homoplastic sites might be important to resolve terminal levels. So rather than removing them, they should be downweighted if they can be identified prior to running a tree, especially when bootstrapping.
Jesse W. Breinholt and Akito Y. Kawahara (2013) Phylotranscriptomics: Saturated Third Codon Positions Radically Influence the Estimation of Trees Based on Next-Gen Data. Genome Biol Evol. 5(11): 2082–2092.
I disagree. Even if saturated, third codon position can carry phylogenetic signal, specially for closely related species. There are evidence that even in large analysis, third codon position carry most of the phylogenetic information:
Kälersjö, M., Albert, V. A. and Farris, J. S. (1999), Homoplasy Increases Phylogenetic Structure. Cladistics, 15: 91–93. doi: 10.1111/j.1096-0031.1999.tb00400.x
Do not put in the trash a third of your evidence without a good reason.
Actually, depending on the taxonomic breadth of your dataset, most of the information is in the third codon position (referring to protein coding genes). I have never heard of a study that would recommend to EXCLUDE this information categorically. Rather, usually you partition your dataset to allow for separate parameter estimates between conserved and highly variable sites. Also, there are two common misconceptions about third-codon positions and saturation:
1. Saturation = fast substitution rate = high likeliness of homoplasy. This applies to any fast evolving site in a gene locus, not just third codon positions. It is actually way more common in variable regions of non-coding gene loci, such as ITS1 and ITS2. The difference is that in protein genes, these homologous sites are identified by definition, whereas in non-coding genes it is often impossible to determine homology with certainty. But even it if can be determined, nobody would suggest to through out the entire ITS1 or ITS1 unless you are aligning across a very broad taxonomic range. So it always depends on the context.
2. Saturation is only a problem if it obscures broad-scale relationships. Deep relationships are more likely to show homoplasy. However, usually in your dataset deep relationships are anchored by more conserved regions, which means homoplastic sites will only affect the overall topology if their relative frequency is higher than that of conserved sites that contain deep-level synapomorphies. On the other hand, these homoplastic sites might be important to resolve terminal levels. So rather than removing them, they should be downweighted if they can be identified prior to running a tree, especially when bootstrapping.