Can someone help me to analyse the Exome sequencing data?

Hi Abhishek,

Have you get help from Bioinformatic to the first step of analysis (sequence aligment, trimming, variants calling using GATK and other bioinformatics tools)?. Since this first step is quite difficult and for sure you need someone with Bioinformatics background. After you have variants Calling (VCF) files (both for SNPs and Indels) you can analyzed by yourself. There are licensed software such as CLC-Bio, Cartagenia, etc. If I have a lot of data (exome seq from 10 or more samples) I used this Licensed program, from those two I like Cartagenia a little bit better than CLC-Bio. But if I have only few trios data I prefer to do more conventional using excel. A little bit difficult to explain by email but if you know how to use excel (vlook, IF program at least) it quite easy to find de novo variants, inherited variants (recessive or dominant pattern).

About how to find disease causing mutations, I think it will be depend on the pattern of inheritance (is recessive, dominat, X-linked) or is it complex disease. Mendelian disorder relatively easy compare to complex diseases, especially if you have several set of trios data. Actually this questions is not simple to be answered you have to start from your hypothesis and then narrow down the list based on your hypothesis.

Good luck!!

Sandrine Caburet

Hi Abhishek,

The answers to your questions depend a lot on the steps of the analysis you are stuck at.

What is the type of data that you have ?

- If you have only the sequenced reads, then you have to perform a lot of boinformatic steps before filtering the variants (mainly Read Quality Check, Mapping, Mapping Quaity check, Local realignment, Variant calling).

This is not easy at first hand, so unless you have some experience in bioinformatic, I would also advice to ask a bioinformatician to do these for you.

- if you have a variant file, most of the time in the form of a vcf file, then what you need to do is:

* Annotate the variants with as many informations that can help you to filter them : mainly position of the variant in/around the gene, highest impact on the transcript/protein, and frequency data if the variant is present in the various databases.

For this you can load your vcf file in Galaxy (http://usegalaxy.org) and use the tools to annotate your variants such as SnpEff. You can performe a lot of different things on your vcf file with the tools available in Galaxy.

Alternatively, you can annotate your variants with VEP, Variant Effect Predictor on Ensembl (http://www.ensembl.org/info/docs/tools/vep/index.html) that performs well to add many information at once. Be careful to use the correct page for the version of your reference genome.

* filter your variants according to your study: this depends whether you have families (then you can filter on the basis of transmission and/or sharing between affected relatives), sporadic patients (then you would want variants that can be different between the patients but that are within the same gene(s)), tumor vs normal tissue (then you would want to perform paired comparison to retain only somatic variants present in the tumors), ...

For this step, I would recommend VarSifter (http://research.nhgri.nih.gov/software/VarSifter/): it's free, runs on any system, has a graphical window so is user-friendly, and its custom query enables filtering on any column of your vcf file. It is very easy to use, and you can easily reverse any filter if you are not satisfied with a result. In my hand, it can handle files with more than 1 million variants. Furthermore, you can export the result of your filtering in an excel table, in order to go on filtering your variants with excel or if you want to add some more info only on a subset of variants.

Hope this will help ! Good luck with your research, and best wishes for this new year.

Sandrine

How to increase protein lysate concentration?

I have synthesized a composite of ZnO and In2O3 with variations in ZnO content. In Raman , what is the peak observed at 130 cm-1?

What are the trending topic to writ e a review article on it?

Why my mesh independent fastener as beam cartesian type not working in abaqus?

How does the foilar application of Banana Pseudostem sap influences the growth as well as yield attributes of oilseed crops?

How to simulate FSS with patch antenna in HFSS ?

The value of RF3(Z) in Abaqus model is huge how to control that?

Can AI platforms aid in efficient and early prediction of biomarkers w.r.t. haematological malignancies ? If yes, than which AI can be used?

Is there any relation between UV absorbance and PL emission spectrum?

How to compute molar Gibbs free energy (chemical potential) for single molecule using Gaussian 09 or 16 software?

How to learn more about SPSS and its Application?

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

Baseline drift in HPLC? What causes this?

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Which Scopus Journal provides the most affordable fees?

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

How to confirm the site-directed mutagenesis result without performing NGS?

GC-MS retention index prediticon?

I can't see the ssDNA band after performing asymmetric PCR. Is there any way to do this?