I have a set of aligned sequences in fasta format. I want to get consensus out of the alignment. In case of most of the sites one of the base is showing maximum occurrence. In case of sites where two or more bases occur equal number of times, which base should be taken? An example is given below:

>Seq_1

ATGCGA

>Seq_2

AT-CGT

>Seq_3

AT-CCG

>Seq_4

AT-CCC

>Seq_5

AA-CT-

As per the conventions this will be the consensus

Consensus : A T G C [G/C] N

But this output of the consensus sequence will throw an error when aligned with other sequences. So what should be done in such scenario and how to get consensus for such sites?

More Ravi Kanth Reddy Sathi's questions See All
Similar questions and discussions