I tried to assemble and get the complete unit of rDNA using clc workbench. But the repeat rich IGS region makes difficult to assemble? Any suggestions?
Dear Sampath, I would suggest to do as follows. Despite some differences, usually rRNA seqences are rather well conserved across organisms. Depending on the species you are working on, if you are lucky enough you should be able to find the complete rDNA sequence of a closely related species.
Then you can use the "map reads to reference" tool with permissive parameters (for example, se mismatch/insertion/deletion penalities to 1, keep length and similarity fraction parameters quite low). At this point, your reads should be mostly aligned to the reference and you will be able to export the consensus sequence you will get from your organism to get the presumptive full-length sequence.
This procedure is a bit tricky and you will probably have to perform the analysis quite a number of times before you will find the optimal mapping prameters, but if you have a sufficiently similar reference sequence to use you should be able to get a result.
Hi Sampath, I second Marco's suggestion. I have successfully used that strategy: find the ribosomal nuclear complex region from a genomic scaffold in the closest species to yours you can find, or else try to find a transcriptome from an even closer species, to use as a reference. Then you can make a BLAST database of your reads and BLAST your reference against it. Retrieve all hits with a reasonable bit-score/e-value and map them to your reference. Then visually inspect your results to make sure that they seem realistic. As Marco mentions, you might need to tweak mapping parameters depending on how close is your species to your reference.