I have a genome sequence however, sequence is in microsoft word file and annotation is present in excel file. How can I convert/merge this information into genbank format.
The simplest option may be a bit of a combination of the two answers above, that is something along the lines of the following:
Open your sequence data in Word
Choose to Save As, and select the drop down format option Plain text from the drop down menu. Now, your saved sequence will be in plain text format. We don't know what sequence format it is in, but that might not be relevant.
Navigate to http://www.ebi.ac.uk/Tools/sfc/readseq/ and upload your plain text sequence file with the Choose file button. Make sure that Input Format selection is set to Auto-detected.
Convert this to GenBank, or any format that can be imported into the sequence editing software of your choice, that can also export to GenBank
Depending on the sequence editing software you use, you may then be able to directly import the annotation data such that it is applied directly to your sequence
Finally, you could export the annotated sequence from the sequence editing software in GenBank format.
I do not understand how you got MS word files from genome sequence. I recommend you to work on Lunix/Unix environment using plain text files. Anyway, you have READSEQ tool at EBI that can convert almost all type of files.
It depends a bit of exactly what layout your spreadsheet is in, but there are tools like tbl2asn2 that can stitch your sequence/annotation data together for a GenBank submission.
Like Alfonso, I'm surprised that they chose a Word document for your sequence data. If you want to extract the data from this, doc and docx files can be opened with an archive manager (7zip or similar) and you can usually find a plain text file inside that has the content. Alternatively, most scripting languages have libraries to extract the text from Office files (including Excel spreadsheets) so if you have some scripting knowledge, or know a bioinformatician who can do this, that's an option too.
The simplest option may be a bit of a combination of the two answers above, that is something along the lines of the following:
Open your sequence data in Word
Choose to Save As, and select the drop down format option Plain text from the drop down menu. Now, your saved sequence will be in plain text format. We don't know what sequence format it is in, but that might not be relevant.
Navigate to http://www.ebi.ac.uk/Tools/sfc/readseq/ and upload your plain text sequence file with the Choose file button. Make sure that Input Format selection is set to Auto-detected.
Convert this to GenBank, or any format that can be imported into the sequence editing software of your choice, that can also export to GenBank
Depending on the sequence editing software you use, you may then be able to directly import the annotation data such that it is applied directly to your sequence
Finally, you could export the annotated sequence from the sequence editing software in GenBank format.
i believe excel files annotation follow gff3 guidelines.
please read about gff3 file format and columns (which column holds what data)
if they match with your data then you have to add 3 lines at the start of notepad file (get any gff3 file from net and you will notice 2-3 lines at start starting with '#'.
Now paste your data on this notepad file and save as gff3 extension.
You will need your master chromosome sequence fast file to convert gff3 to genbank as genbank file contains sequences too. gff3 might contain sequences too but the tool we will use uses external fasta