Thank you for sharing this comprehensive workflow!May I ask which reference genome you recommend for Rosa species? Also, do you have any tips for optimizing the parameters for HISAT2 alignment specifically for rose transcripts?
For Rosa species, the choice of reference genome depends on the species and the quality of the available genome assemblies. Some commonly used reference genomes for roses include:
Rosa chinensis ‘Old Blush’ (The most well-annotated genome) Genome available at GDR (Genome Database for Rosaceae) or NCBI Assembly: Rosa chinensis v2.0 (or updated versions)
Rosa multiflora (Wild species) Also available in public databases, but less commonly used than R. chinensis.
Rosa damascena and Rosa rugosa (Used for fragrance research) Limited genome assemblies, but RNA-seq studies exist.
For HISAT2 alignment optimization for rose transcripts, follow these tips:
1. Indexing the Genome
Use the genome with transcript annotations (.gtf file) if available:hisat2-build -p 8 Rosa_chinensis_genome.fa Rosa_index
If your species lacks a well-annotated genome, consider de novo transcriptome assembly (e.g., using Trinity).
2. Optimal Parameters for RNA-Seq Alignment
HISAT2 is splice-aware, so use --dta for transcript assembly and --rna-strandness for strand-specific libraries.
Example command for paired-end reads:hisat2 -p 8 --dta --rna-strandness RF -x Rosa_index \ -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz \ -S output.sam
If you have single-end reads, use:hisat2 -p 8 --dta --rna-strandness R -x Rosa_index \ -U sample.fastq.gz -S output.sam
3. Handling Large Genomes
Increase memory efficiency for large rose genomes:hisat2 -p 8 --dta --max-intronlen 500000 -x Rosa_index \ -1 sample_R1.fastq.gz -2 sample_R2.fastq.gz \ -S output.sam
--max-intronlen 500000 is useful if you expect large introns in the genome.
4. Post-Alignment Processing
Convert SAM to BAM:samtools view -bS output.sam > output.bam samtools sort output.bam -o sorted_output.bam
Quality check:samtools flagstat sorted_output.bam
5. Alternative Approach: Direct Transcriptome Alignment
If a reference genome is unavailable or low-quality, you can align directly to a transcriptome index:hisat2-build transcriptome.fa transcriptome_index hisat2 -p 8 -x transcriptome_index -U sample.fastq.gz -S output.sam may be it will be helpful for you Leila Khosravi