I am trying to find a convenient way to download, in batch, the FASTA sequence for as many bacterial orthologs of a gene that I can. I have tried using the E-utilities to access NCBI, and I can successfully search the database and retrieve gene IDs, but cannot progress further.

For what it is worth, here is the main part of PHP code I used to get this far (which can be seen in action at http://djcamenares.x10.mx/testing/parse1.php):

$startUrl="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=gene&term=alaS&retmax=10000&usehistory=y";

$xml1=simplexml_load_file($startUrl) or die("Error: Cannot create object");

foreach ($xml1->IdList->children() as $child1)

{

$newUrl1="http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=gene&db=protein&id=" . $child1;

echo $newUrl1 . "
";

};

The code spits back a list of the links for the next step, but I can't seem to get them to work (for example, dumping them into $xml2=simplexml_load_file($newUrl1) or die("Error: Cannot create object"))

Any programming fixes, or knowledge of other databases, would be helpful. Interested primarily in prokaryotes.

http://djcamenares.x10.mx/testing/parse1.php

Similar questions and discussions