What is the benefit of index files and .bam files? How can we get the information from contig reads. What does coverage stands for and how is ti calculated?
Hi Nageena, please find the definition of BAM files (Genomatix homepage):
BAM is the compressed binary version of the Sequence Alignment/Map (SAM) format, a generic format for storing large nucleotide sequence alignments. A detailed description of the BAM/SAM format can be found on the SAM Tools web site: http://samtools.sourceforge.net.
The information from the contig is summarized in the so called consensus sequence, which is given in the midddle of the above screenshot (leading and lagging strand). Open read frames or ORFs are given in blue, which means usuall potential genes..
Finally coverage: Every sequencing is error-prone and a single read therefore may contain incorrect sequence data. But usually this errors are caused by statistc effects, this means if you collect several reads for each position, you can get to an overall consensus sequence correcting the individual read errors. Then we talk about consensus qualities e.g. Q30 is one error stillpossible among 1000 bases, Q40 is 1 error among 10000 bases and so on..,