I have a list of >1M SNP rs ids for which I would like to find the chromosome and the position. Can you please suggest how can I do this?
Thanks in advance
Hi Mahantesh,
One way to search a large number of TSIDs is to run VEP:
http://asia.ensembl.org/Homo_sapiens/Tools/VEP
Another is to download the DBSNP database and use GREP or AWK to pick the annotations.
Hi Pawel! Thanks for your suggestions. I have about 1M SNP and I'm wondering if these two methods are suitable? Will you be able to share the code for the second method? Thanks in advance
The syntax below, you need to run either on UNIX or OS (Mac). The resulting file will be sorted by chromosomes and have the following fields:
#CHROM, POS, REF, ALT, QUAL, FILTER, INFO
=> Syntax for unzipped DBSNP file:
grep -w -f /path/to/your/list_of_rsids /path/to/dbsnp_file.vcf | sort -k1.4,1 -V -s > /path/to/your/new_file
=> Syntax for gzipped (GZ) DBSNP file:
zgrep -w -f /path/to/your/list_of_rsids /path/to/dbsnp_file.gz | sort -k1.4,1 -V -s > /path/to/your/new_file
Your RSID file should have 1 column. IF you have done the list in Windows, you need to convert to UNIX format (UTF8):
#nstall the package "dos2unix" in UNIX or OS:
sudo apt install dos2unix
#input the password
#then run
dos2unix -n /path/to/your/rsid_file /path/to/your/new_rsid_file
Hope it helps :-)
Thank you so much, Pawel! let me try this one.
27 July 2014 4,453 4 View
Good day researchers I am busy analysing the predicted effects of certain SNP's in AVPR1b. When using the VEP prediction programs (SIFT, Polyphen, FATHMM, LRT, Provean, Mutation Assessor, and...
28 February 2021 5,197 1 View
Hi everybody In the ped format for genotype, alleles of any SNP are represented by two columns (one for each allele, separated by a space). Is a column sufficient for the haplotype to...
27 February 2021 1,965 1 View
I have a time series data of some biochemical studies ( DNA and Chromosomal damage) which I intend to use to further predict into future without necessarily conducting the assay for an extended...
23 February 2021 7,842 1 View
I treid to use Entrez Batch, but it redirects me to NCBI site. Is there any script of Linux or Python through which I can download Fasta sequence of list of gene ID ?
21 February 2021 7,783 12 View
I was trying to isolate fibroblastic reticular cells (FRC) and follicular dendritic cells from mice spleen, but kept getting poor yield. I found from literature that the frequency of FDC is around...
17 February 2021 3,506 4 View
Hi all, I am working on methanol steam reforming using packed bed for hydrogen generation. I am using FLUENT software for CFD simulation. In reference paper, authors mentioned that effective mass...
10 February 2021 6,104 1 View
I would appreciate if you answer my DBA research questionnaire. Thank you all in...
10 February 2021 1,990 3 View
Hi everyone. We want to know how can I remove the non-specific band amplification on this variant. Although the amplification of non-specific band is not as intense as the band that we want, but...
09 February 2021 588 2 View
I am planning to hybradize Q-Learning algorthm with Genetic algorithm at the population initialization stage, crossover stage and mutation stage. The problem under consideration is a...
07 February 2021 3,123 8 View
Hi All, I do have a single-stranded oligonucleotide of 100 - 150 bases synthesized and the sequence is not known. How would I proceed to know the sequence .
04 February 2021 5,614 3 View