I am using bwa for the mapping of single end reads to the reference genome using following commands.

bwa-0.7.5a/bwa index -a bwtsw ref.fna

bwa aln ref.fna reads.fq > in.sai

bwa samse ref.fna in.sai reads.fq > out.sam

samtools view -S out.sam -b -o out.bam

samtools sort out.bam out.sorted.bam

bam2fastq -o reads.fq --no-aligned out.sorted.bam

samtools mpileup -uf ref.fna out.sorted.bam | bcftools view -cg - | vcfutils.pl vcf2fq > final.fastq

seqret -osformat fasta final.fastq -out2 final.fa

My final output file look like this nnnnnnnnnnncgctagTGACATATATATctaaaaaaaagctTTGCC.

In my final output file (final.fa), I found that there are a lot of lowercase bases and the fa file is a mixture of small n, upper case bases and lowercase bases! What is the actual meaning of lowercase bases present in the file? Do they relate to the quality of information? Should they be discarded or translated to upper case? Note: My reference genome (ref.fna) file does not contain any lowercase bases.

More Ashok k Sharma's questions See All
Similar questions and discussions