Hi, I'm doing variant calling from my RNAseq data on a cell line (Paired End 2X100b, Illumina)

I'm following GATK workflows and everything is working almost fine since I can find all the expected mutations already characterized in my cell line. The problem is that I see some additional mutations that are clearly the result of an alignment error (e.g. NM_001364837:exon8:c.727+1G>A, rs781065280)

I'm using the latest release of STAR aligner 2.7.6a to perform the alignment with the following options

STAR --runThreadN 8 --genomeDir $GENOME_DIR \

--readFilesIn ... --readFilesCommand gunzip -c \

--outSAMattrRGline ... \

--outSAMtype BAM SortedByCoordinate \

--twopassMode Basic \

--outSAMmapqUnique 60 \

--outFilterType BySJout --alignIntronMin 20 \

--alignIntronMax 1000000 --alignMatesGapMax 1000000 \

--alignSJoverhangMin 8 --alignSJDBoverhangMin 3 --scoreGenomicLengthLog2scale 0 \

--outFileNamePrefix $OUTPUT_DIR/...

The problem resides in STAR mapping some reads with a mismatching overhang of 4 bp at beginning of the intron instead of picking a perfect match at the beginning of the next exon (check out the screenshot at: https://i.ibb.co/LRDYNt3/mismatch.png)

As you can see, `--alignSJDBoverhangMin` is set to 3 and the length of the overhang is 4bp.

Also the splice junction should be canonical so a spliced mapping should carry no penalty.

Am I doing something wrong?

More Francesco Antonio Tucci's questions See All
Similar questions and discussions