I know that the bed file (Binary ped) of plink is different from UCSC's bed file. I would like to create a bed file of genomic co-ordinates of all the SNPs in plink data. Can any one help me with this?
1. VCF treats insertions and deletions in a strange way: the base to the left of the insertion or deletion must be included, so POS is one less than it would be otherwise. The end position of a deletion has to be either inferred from allele lengths or taken from the INFO column's END tag (if provided). But if your VCF contains only single-nucleotide substitution variants, this doesn't apply.
2. BED coordinates are 0-based, half open, a.k.a. interbase. When converting from the usual 1-based, fully closed coordinates, subtract 1 from the start but not the end. When BED start equals BED end, the item has a length of 0.
This command would work, provided the VCF file contains only single-nucleotide substitutions:
Hello Angie Hinrichs ,Thank you so much for your suggested solution, I have successfully used Liftover to update my data from b36 to b37. Would you have any recommendations on how I may convert the resulting output .bed file back to plink binary (.fam .bed .bim) formats?
Would it be correct to convert the UCSC .bed file to plink .bim via the awk command, and make a new set of binary files using the new .bim file (if this is possible) with --make-bed via PLINK?
Hi Yuen Yan Wong. I do not know how to convert UCSC BED directly to plink. However, there is a ResearchGate question "Coverting vcf files to plink bed format?" with answers that suggest "plink --vcf" with VCF as input. (Internet search for "convert vcf to plink" might also be informative.)
Unfortunately, my suggested command for translating VCF to BED kept only the identifier name, not the ref and alt allele required for converting back into VCF. Even if it kept the ref and alt alleles, sometimes the ref value changes from one assembly to the next, so carrying ref and alt directly over to the new assembly would produce some errors.
In case your original files were VCF: There is a tool CrossMap (http://crossmap.sourceforge.net) that is capable of lifting over VCF correctly, comparing the new assembly's ref and alt to the original VCF ref and alt. Perhaps it would work to use CrossMap instead of liftOver, and then import the resulting VCF into plink.
Hello Angie Hinrichs. Thank you so much for your response! CrossMap seems to be working well for me as I could keep the input and output files as .vcf's, and just convert the output .vcf back to plink formats easily. Thanks again!