I have sequence data from the ezRAD method. Most of the sequences have the cut site "GATC" at the start, however some do not (see example below). I am trying to find a program that will remove any sequence that does not have the "GATC" at the start. However, all I can find are trimming/adapter removal programs.

Suggestions? Was going to write my own script to do this, but would rather not re-invent the wheel.

KEEP:

@D3NT6Q1:329:C6396ACXX:6:1101:11990:3865 2:N:0:GTCCGCTCTTT

GATCTTTCTCCAATTCCCCGCTCTCCAAGTCTCAGGAGTATTCAAAACAAACAACGTCCTATTTATGCCCTGACATCACATCTCTATGGCAACCACACTT

+

BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIBFFIIIIIIIIIFIIIIIFFIFIIFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFBFBFBB

DISCARD:

@D3NT6Q1:329:C6396ACXX:6:1101:2011:28589 2:N:0:GTCCGCTCTTT

TACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGGGC

+

BBBFFFFFFFFFFIIFIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFBBBBBBBBBBBBFFBFFFBFF7BBFB

More Joshua A. Thia's questions See All
Similar questions and discussions