Have you get help from Bioinformatic to the first step of analysis (sequence aligment, trimming, variants calling using GATK and other bioinformatics tools)?. Since this first step is quite difficult and for sure you need someone with Bioinformatics background. After you have variants Calling (VCF) files (both for SNPs and Indels) you can analyzed by yourself. There are licensed software such as CLC-Bio, Cartagenia, etc. If I have a lot of data (exome seq from 10 or more samples) I used this Licensed program, from those two I like Cartagenia a little bit better than CLC-Bio. But if I have only few trios data I prefer to do more conventional using excel. A little bit difficult to explain by email but if you know how to use excel (vlook, IF program at least) it quite easy to find de novo variants, inherited variants (recessive or dominant pattern).
About how to find disease causing mutations, I think it will be depend on the pattern of inheritance (is recessive, dominat, X-linked) or is it complex disease. Mendelian disorder relatively easy compare to complex diseases, especially if you have several set of trios data. Actually this questions is not simple to be answered you have to start from your hypothesis and then narrow down the list based on your hypothesis.
The answers to your questions depend a lot on the steps of the analysis you are stuck at.
What is the type of data that you have ?
- If you have only the sequenced reads, then you have to perform a lot of boinformatic steps before filtering the variants (mainly Read Quality Check, Mapping, Mapping Quaity check, Local realignment, Variant calling).
This is not easy at first hand, so unless you have some experience in bioinformatic, I would also advice to ask a bioinformatician to do these for you.
- if you have a variant file, most of the time in the form of a vcf file, then what you need to do is:
* Annotate the variants with as many informations that can help you to filter them : mainly position of the variant in/around the gene, highest impact on the transcript/protein, and frequency data if the variant is present in the various databases.
For this you can load your vcf file in Galaxy (http://usegalaxy.org) and use the tools to annotate your variants such as SnpEff. You can performe a lot of different things on your vcf file with the tools available in Galaxy.
Alternatively, you can annotate your variants with VEP, Variant Effect Predictor on Ensembl (http://www.ensembl.org/info/docs/tools/vep/index.html) that performs well to add many information at once. Be careful to use the correct page for the version of your reference genome.
* filter your variants according to your study: this depends whether you have families (then you can filter on the basis of transmission and/or sharing between affected relatives), sporadic patients (then you would want variants that can be different between the patients but that are within the same gene(s)), tumor vs normal tissue (then you would want to perform paired comparison to retain only somatic variants present in the tumors), ...
For this step, I would recommend VarSifter (http://research.nhgri.nih.gov/software/VarSifter/): it's free, runs on any system, has a graphical window so is user-friendly, and its custom query enables filtering on any column of your vcf file. It is very easy to use, and you can easily reverse any filter if you are not satisfied with a result. In my hand, it can handle files with more than 1 million variants. Furthermore, you can export the result of your filtering in an excel table, in order to go on filtering your variants with excel or if you want to add some more info only on a subset of variants.
Hope this will help ! Good luck with your research, and best wishes for this new year.