I am trying to write a program to annotate any VCF files against 1000 Genomes database.  I have all the files with me.  Now Surprisingly I have found out a variant  which is :

1 207237233 . GGTGT GGGGGGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGTGTGT,GGTGTGTGT,GGTGTGTGTGT,GGTGTGTGTGTGT,GGTGTGTGTGGTGTGTGTGTGT,GGTGTGTGTGTGTGT,TGTGT,GGT,G 100 PASS AC=51,14,83,333,158,164,46,3,17,627,748,11;AF=0.0101837,0.00279553,0.0165735,0.0664936,0.0315495,0.0327476,0.0091853,0.000599042,0.00339457,0.1252,0.149361,0.00219649;AN=5008;NS=2504;DP=19968;EAS_AF=0.0367,0,0.0129,0.0377,0.004,0,0,0.001,0,0.0655,0.2381,0.002;AMR_AF=0.0029,0.0043,0.0159,0.072,0.0202,0.0231,0.0029,0,0,0.1225,0.1816,0.0014;AFR_AF=0,0,0.0023,0.0378,0.0938,0.1104,0.0325,0.0008,0.0129,0.149,0.0439,0.0008;EUR_AF=0.002,0.008,0.0408,0.1302,0.007,0.002,0.001,0.001,0,0.164,0.1163,0.001;SAS_AF=0.0102,0.0031,0.0153,0.0654,0.0092,0,0,0,0,0.1166,0.2117,0.0061

Point to be noted here is, at this particular genomic coordinate there are 12 alternate alleles.  

I was aligning all the alleles with reference alleles to get an idea.  But it looked strange to me, and I didn't get the meaning.  I have attached the a text file,  the file I made to understand the mode of insertion and deletion.

If I am interpreting it correctly about reference and alternate bases in the txt file I attached, then for the first four alternate alleles actually the coordinate of reference allele will move towards right.

Looking forward to valuable inputs.

Thanks in advance.

More Sourav Nayak's questions See All
Similar questions and discussions