I am trying to write a program to annotate any VCF files against 1000 Genomes database. I have all the files with me. Now Surprisingly I have found out a variant which is :
1 207237233 . GGTGT GGGGGGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGTGT,GGGGTGTGTGTGTGTGTGTGT,GGTGTGTGT,GGTGTGTGTGT,GGTGTGTGTGTGT,GGTGTGTGTGGTGTGTGTGTGT,GGTGTGTGTGTGTGT,TGTGT,GGT,G 100 PASS AC=51,14,83,333,158,164,46,3,17,627,748,11;AF=0.0101837,0.00279553,0.0165735,0.0664936,0.0315495,0.0327476,0.0091853,0.000599042,0.00339457,0.1252,0.149361,0.00219649;AN=5008;NS=2504;DP=19968;EAS_AF=0.0367,0,0.0129,0.0377,0.004,0,0,0.001,0,0.0655,0.2381,0.002;AMR_AF=0.0029,0.0043,0.0159,0.072,0.0202,0.0231,0.0029,0,0,0.1225,0.1816,0.0014;AFR_AF=0,0,0.0023,0.0378,0.0938,0.1104,0.0325,0.0008,0.0129,0.149,0.0439,0.0008;EUR_AF=0.002,0.008,0.0408,0.1302,0.007,0.002,0.001,0.001,0,0.164,0.1163,0.001;SAS_AF=0.0102,0.0031,0.0153,0.0654,0.0092,0,0,0,0,0.1166,0.2117,0.0061
Point to be noted here is, at this particular genomic coordinate there are 12 alternate alleles.
I was aligning all the alleles with reference alleles to get an idea. But it looked strange to me, and I didn't get the meaning. I have attached the a text file, the file I made to understand the mode of insertion and deletion.
If I am interpreting it correctly about reference and alternate bases in the txt file I attached, then for the first four alternate alleles actually the coordinate of reference allele will move towards right.
Looking forward to valuable inputs.
Thanks in advance.