I have a couple of transcriptomic sequences that were generated with different sequencing platforms such as Miseq and Hiseq. The output format is hence different than Sequence Read Archive NGS file formats. An example below, lets call it #1 (after splitting and trimming):

>M01403:7:000000000-A45GT:1:1102:16645:1483_1:N:0:2

TAATTGATCCGTTAA........

whereas the SRA transcriptomic file would look like this after splitting and trimming (this one is #2):

>SRR1005592.1_FCD114LACXX:1:1101:1187:2066_length=89

TTCGCATGTGCCGTTTG......

I have a pipeline where I can get the species level identification for most of the blast hits for a given SRA file. However, this pipeline does not work #1, which I assume is due to the differences in identifier. I am not exactly sure why it is not working. Therefore I need suggestions on why it may not work and what other tools I can use for taxonomic unit identification for my megablast outputs.

More Bilgenur Baloglu's questions See All
Similar questions and discussions