Take a look Trinity and Velvet, they provides some tools in pipeline for annotation. I am not sure whether you can directly use them (as they meant for RNA-seq) on your data set, but at least you will be able to get an idea.
I have already done the Denovo assembly, and now having data 16650 pages in FASTAQ format and now want to go ahead to assemble mapping,annotation and submitting in gene bank
I am not sure what you mean with "pages" of FASTQ. If you have already done a de novo assembly, how many contigs and scaffolds did you get? What kind of data and which assembler did you use?
Okay, that helps. These results are a little bit strange, for a number of reasons.
First of, the total length of the contigs (Large and even more so All) is way too high. I am working with bacteria, but if I remember correctly, the size of a chloroplast genome is around 110k, not 763.8k (or even 1,995.7k). This indicates that you have something else in there. Most likely contigs of either mitochondrial and/or chromosomal origin. Next, the length of the largest contig with 7,032bp is really short and the number of contigs is way to high. While the latter might be attributed to the aforementioned contaminations, the small contig size is troubling. With this kind of data, I would not try to finish the genome, but rather start troubleshooting the assembly (and possibly the whole sequencing setup).
Let us start with the basics: 1. Have you checked how many of your contigs belong to the chloroplast? Easiest done by a simple batch BLASTN search.
2. What kind of library/libraries and technologies did you use for sequencing? 454 or Illumina? WGS and/or MatePair /Long Paired End? And what is your coverage? Or better: What is the amount of raw data (in Mbp) you used for the assembly?
No problem. If you took the first few contigs (i.e. the largest), this at least tells us that the main part of the DNA comes from chloroplasts. And the amount of sequencing data should also be sufficient. That is puzzling and difficult to solve without the actual data.
The easiest way (for me) to identify the possible problem(s): Could you give me access to the "raw" sequencing data (the SFF file)? That way, I can simply put that through my standard pipelines to see what might be wrong.