I have illumina dat of a virus genome, coverage 100 - 1000x and I want to look a SNP variants within the host. I have looked at CLC which has probabilistic and quality variant scoring but are there others you would suggest?
What kind of virus do you have? Keep in mind that there is a vast literature on discovering viral genetic diversity with deep sequencing. A couple of reviews are linked here.
As Benedikt said you need the vcf data. GATK is a good example but to my knowledge it is studied to analyze human data and to analyze other organisms you need to choose very well your parameters. I did SNP calling in bacteria with mpileup tool of the samtools package. It basically looks at the bam file and does the variation calls. Here are the commands: http://samtools.sourceforge.net/mpileup.shtml
After you have the vcf file you may want to filter the variants for quality or coverage. I suggest to use the last version 1.x of bcftools package; here you can find bcftools filter!
Once you have the vcf you can browse it manually on genome browser software such as IGV.
I agree with the previous recomendations about using vcfs, I can suggest to use SnpEff to annotate those vcfs (I think nowadays GATK uses SnpEff in its pipeline).
GATK and ANNOVAR were the next ones I was thinking of trying.
I have fastq files and Sam/bam files for ssRNA viruses and want to look at intrahost diversity so I need to know how many and what snps are there. I have used samtools to generate consensus sequences and I (and others in my unit) have found the latest version is not calling the correct consensus. This didn't happen with the previous version and I am not sure why.
I would also favor samtools over other things. GATK can be very hard to use outside of Human data. Annovar is fine, you will need a bit of tuning, but I think there are tutorials/ manuals out there.
What mapper did you use? When something like this happens, I would quickly check my data using IGV. This always helps me to identify the problem.
Btw. just doing a bit of advertising for my own (sorry). NextGenMap estimates the paramters for mapping on its own given 1mil randomly chosen reads. Installation and running is easy and it is faster as bowtie2.
If your are not an expert in script writing for bioinformatics, Golden Helix's SNP variantion suite is user friendly and can be used for GWA, RNAseq and other sequence variant analysis. They have a demo and you can try it out.
Sorry, but you need to execute 3 tools on a command line. That is no where close on script writing. Pus since she is a bioinformation she should learn how to do that. Otherwise I guess it is going to be hard to find a job. (sorry if this sounds harsh, but a Bioinformation should be able to execute programs on a command line).
No worries. I use the command line but as you will appreciate there are a million tools available and everyone thinks theirs is the best. Just looking to see what one's people like. My group have had problems with samtools mpileup giving errors in consensus calling since the latest version came out so I have lost faith in it a little. We seem to be favouring GATK at the moment.