09 October 2018 4 6K Report

Dear colleagues,

I pooled together dozens of samples into one library, and sequenced them together. This yielded millions of reads, with the sequencer assigning each unique read-IDs. While the samples have thousands of reads, each read ID corresponds to only one sample.

For example, the read-ID 1RDD4_00016_01732 (first column) only belongs to the sample o120 (second column).

1RDD4_00016_01732 o120

1RDD4_00016_01756 o297

1RDD4_00016_01943 o316

1RDD4_00016_02031 o296

As shown above, the read IDs are paired with their respective samples, and listed in a group-file format (i.e., two columns, tab-separated; https://www.mothur.org/wiki/Group_file).

With a group-file, and fasta file, how can I use Unix (or a similar program) to:

  • Group together read IDs in the fasta file by their respective sample?
  • Separate these groups of samples into individual fasta files such that each file contains only one sample?
  • Best,

    Gary Sur

    More Gary Sur's questions See All
    Similar questions and discussions