Hi all,

I am trying to perform self-training with GeneMarkS to improve protein calling from virus genomes and transcripts. Could someone tell me if it is correct what I am doing? First, I download eukaryotic viruses from NCBI Refseq to create a "matrix" using gmsn.pl:

/fs/project/PAS1117/modules/GeneMarkS/3.36/gmsn.pl -euk --name virusgroup1 --gm /virusgroup1_refseq_genomes.fasta

which generated (among many others) the following model files:

virusgroup1_gm_heuristic.mat virusgroup1_gm.mat virusgroup1_hmm_combined.mod virusgroup1_hmm_heuristic.mod virusgroup1_hmm.mod

then I used the one named "virusgroup1_gm.mat" to run genemark against a single virus genome (that belongs theoretically to group 1, so GeneMark should call correctly all its viral genes):

/fs/project/PAS1117/modules/GeneMarkS/3.36/gm -m group1_gm.mat -l o q -o p -r p -v NC_023420-2.fasta

nevertheless, I only get a file named "NC_023420-2.fasta.lst" with a few gene coordinates, BUT NO PROTEIN FILE (even having set the options for that):

List of Open reading frames predicted as CDSs, shown with alternate starts (regions from start to stop codon w/ coding function >0.50)

Left Right DNA Coding Avg Start end end Strand Frame Prob Prob

42 4046 direct fr 3 0.60 .... 195 4046 direct fr 3 0.60 0.79 297 4046 direct fr 3 0.60 0.17 333 4046 direct fr 3 0.60 0.10 537 4046 direct fr 3 0.61 0.06 570 4046 direct fr 3 0.60 0.12 List of Regions of interest (regions from stop to stop codon w/ a signal in between)LEnd REnd Strand Frame

21 4046 direct fr 3

Can you guess what is wrong?

Thanks in advanced, Guillermo

More Guillermo Domínguez Huerta's questions See All
Similar questions and discussions