Dear colleagues,
I pooled together dozens of samples into one library, and sequenced them together. This yielded millions of reads, with the sequencer assigning each unique read-IDs. While the samples have thousands of reads, each read ID corresponds to only one sample.
For example, the read-ID 1RDD4_00016_01732 (first column) only belongs to the sample o120 (second column).
1RDD4_00016_01732 o120
1RDD4_00016_01756 o297
1RDD4_00016_01943 o316
1RDD4_00016_02031 o296
As shown above, the read IDs are paired with their respective samples, and listed in a group-file format (i.e., two columns, tab-separated; https://www.mothur.org/wiki/Group_file).
With a group-file, and fasta file, how can I use Unix (or a similar program) to:
Best,
Gary Sur