I have sequence data from the ezRAD method. Most of the sequences have the cut site "GATC" at the start, however some do not (see example below). I am trying to find a program that will remove any sequence that does not have the "GATC" at the start. However, all I can find are trimming/adapter removal programs.
Suggestions? Was going to write my own script to do this, but would rather not re-invent the wheel.
KEEP:
@D3NT6Q1:329:C6396ACXX:6:1101:11990:3865 2:N:0:GTCCGCTCTTT
GATCTTTCTCCAATTCCCCGCTCTCCAAGTCTCAGGAGTATTCAAAACAAACAACGTCCTATTTATGCCCTGACATCACATCTCTATGGCAACCACACTT
+
BBBFFFFFFFFFFIIIIIIIIIIIIIIIIFIIIIIIIIBFFIIIIIIIIIFIIIIIFFIFIIFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFBFBFBB
DISCARD:
@D3NT6Q1:329:C6396ACXX:6:1101:2011:28589 2:N:0:GTCCGCTCTTT
TACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGGGC
+
BBBFFFFFFFFFFIIFIIIIIIIIIIIIIIIIIIIIFFFFFFFFFBFBBBBBBBBBBBBFFBFFFBFF7BBFB