Hi all, I'm looking for advice in my sequence ancestral reconstruction (ASR) project.
I have been using phyml in linux for the main reason to predict ancestrals sequences from a set of aligned protein sequences (msa.phylip) plus a tree (tree.nwk) made in other software. The command that I usually run is: /home/default/phyml-master/src/phyml -i msa.phylip -d aa -m LG -f e -v 0 -a e -c 12 -o tlr -b 0 -u tree.nwk --freerates --no_memory_check --print_site_lnl --ancestral
Intuitively, I now that in the file *_ancestral_seq.txt have the protein sequences for each node of the tree.
But, when I extract the ancestral sequences of each node, positions where should be a gap is filled with a residue. Then I get protein sequences too big. For instance, a query protein of 280 residues long has ancestors of 380 residues long. Surely because the msa has sequences of many different lengths, and phyml will attribute a probability for the posibles 20 residues at each position.
My question is, which are the ways to get ancestor sequences whitout gaps been filed with residues?
I think that a script or other program can be used to get rid of these residues. Or maybe should have some extractor of sequences that recognise the symbol "-" as a residue, and then based on the probabilities from Phyml output, the program will fill the "-" in the most probable positions as should be. I would appreciate to know any of possible ways to deal with this.
Thank you.