Hi All, we have a bioinformatics challenge and we would love any help this community can offer. We have data from a target-enrichment experiment that was supposed to capture certain microsatellite motifs. The three enriched libraries were sequenced in a rapid run on Illumina Hiseq 2500 (paired end mode) and our data is in the standard illumina fastq output. Our three libraries come from three different sources. The first library is developed from fresh fish tissue; the second one is mammal tissue; and the third one is the same mammal species but from fecal samples. For the fecal samples, we need to somehow filter out sequences belonging to the mammal only (i.e. not prey or microbiome). We have a reference genome for the mammal, but not for the fish. The data has been demultiplexed already (so for the fish we have 40 individual fish each with its own .fq file containing all the read data). Now, we are facing the challenge of how to deal with this data. Although we are familiar with most basic bioinformatic tools and analyses we do not have advanced programming skills. We need to find a way not only to find and identify the length of our microsats within the reads but also (for the fecal library) somehow be able to identify unique flanking sequences that would correspond to our mammal, in such a way that the reads of other species in the fecal libraries can be excluded. Would anyone have a suggestion on what approach(es) we could use? We have already (unsuccesfully) attempted to tackle this with SSR_pipeline. Thank you in advance for any help you can offer - it is very much appreciated! Daniel & Vania

More Vania Carolina Fonseca da Silva's questions See All
Similar questions and discussions