I want to infer the ambiguous nucleic acid of genome.

I have these data set below.

ambiguous data

1 2 3 4 ・・・・・・100

A AorT A   T ・・・・・・T

A GorT A   G ・・・・・・T

complete data

1 2 3 4 ・・・・・・100

A A T T ・・・・・・T

A A A T ・・・・・・A

A A A G ・・・・・・G

I want to predict each probability of nucleic acid at ambiguous position.

I want result like that when 1 is A ,3 is A ,and 4 is T, 2 is A (30%) and T (70%).

And in this case using other three position data, I need to determine how many other position data should be used statistically.

More Ryosaku Ota's questions See All
Similar questions and discussions