Basically, you need to join the PE reads (if they are PE) and convert fastq to fasta. Then, if you are using qiime, run pick_closed_reference_otus.py script to make a biom file and whatever downstream analysis (alpha-, beta-diversity) you decide to do.
Thank you for the information. I have taken data from MG-RAST. The data is already processed (barcode sequence and primer sequences are removed). Can I use these data directly in qiime, to pick OTUs by closed or open reference? Because it is mentioned in some protocol that mapping file is required for the analysis. To create mapping file I do not have relevant data and also raw data sequence. So I have to proceed with the sequence data available form the MG-RAST
Am also having this issue, so would like to join the conversation, please. The Illumina data I have is already-demultiplexed fastq files, with the barcode and primer sequences removed. I want to perform split_libraries.py on it, so that I can distinguish between the samples in my final analysis.
I have converted it to fasta format and made a mock mapping file following Marina's advice (I did this in Excel). However, this doesn't work as I get an error message saying 'Errors found in mapping file' when i run either validate_mapping_file.py or split_libraries.py. Any help for both Jayaram and I would be greatly appreciated! Thanks!
Jayaram, yes, you can use the MG-RAST or any mock fastq data if you convert it to fast and make a mock mapping file.
Matt, do not use Excel if you work in UNIX. It creates a different format and causes problems like the one you are having. If you really want to use Excel you can try to save your file as tab delimited file and then use dos2unix command in UNIX. Generally, it is common to use vi or vim text editor in UNIX. If it's too hard for you, try Notepad ++, it's still better than Excel.
Although you should still check your files in vim. Do:
vim mapping_file.txt
In vim do:
:set list
"set list" would show you the hidden characters. You need to make sure that you file is tab delimited and you have consistent number of rows and columns everywhere.