I'm hoping this is just a simple problem with a simple answer. I am trying to generate Bowtie2 index files for a genome file in fasta format with a gff3 file I provided. I cannot utilize premade bowtie2 index files for the genome since I have ERCC RNA spike-in artificial mRNAs that were added to the libraries that I concatenated to both the genome sequence file and the gff3 annotation file.
I am calling up tophat with:
tophat --GTF Tcas.gff \
--transcriptome-index=transcriptome_data/Tcas Tcas
and the files are named Tcas.fa and Tcas.gff found in both the pwd and in the folder ./transcriptome_data.
I've added the /path/to/transcriptome_data/ to the $BOWTIE2_INDEXES environmental variable with the export function. According to my understanding of the manual by not adding sequence file names to the arguments given to Tophat (running version 2.1.1) it should try to generate .tlst, .ver, .gff, and bowtie index files .bt2l or .bt2 .
Instead Tophat tries to look for a preexisting bowtie2 index and of course does not find one because it has not built them yet! The stderr output looks like this:
[2019-01-02 18:54:18] Building transcriptome files with TopHat v2.1.1
-----------------------------------------------
[2019-01-02 18:54:18] Checking for Bowtie
Bowtie version: 2.2.9.0
Checking for Bowtie index files (genome)..
Error: Could not find Bowtie 2 index files (Tcas.*.bt2l)
I am thinking I am violating some naming convention/needed directory structure and Tophat is not finding the genome sequence file and proceeding to try to recreate it from non-existing bowtie index files. Any advice/instruction as to where I'm going wrong here?