I want to infer the ambiguous nucleic acid of genome.
I have these data set below.
ambiguous data
1 2 3 4 ・・・・・・100
A AorT A T ・・・・・・T
A GorT A G ・・・・・・T
・
・
・
complete data
1 2 3 4 ・・・・・・100
A A T T ・・・・・・T
A A A T ・・・・・・A
A A A G ・・・・・・G
・
・
・
・
I want to predict each probability of nucleic acid at ambiguous position.
I want result like that when 1 is A ,3 is A ,and 4 is T, 2 is A (30%) and T (70%).
And in this case using other three position data, I need to determine how many other position data should be used statistically.