I am trying to extract some information (metadata) from GenBank using the R package "rentrez" and the example I found here https://ajrominger.github.io/2018/05/21/gettingDNA.html. Specifically, for my group of interest, I search for all records that have geographical coordinates and then want to extract data about the accession number, taxon, sequenced locus, country, lat_long, and collection date. As an output, I want a csv file with the data for each record in a separate row. I am attaching the script I have constructed and it seems it can do the job but at some point, rows get muddled with data from different records overlapping the neighbouring rows. For example, from the 157 records that rentrez retrieves from NCBI the first 109 records in the resulting file look like what I want to achieve but the rest is a total mess. I suspect this happens because the XML contents differ a bit between the GenBank entries but cannot figure out how to fix the problem. Any help would be greatly appreciated because I am a newbie with R and figuring out each step takes a lot of time. Thanks in advance!

More Serge Filatov's questions See All
Similar questions and discussions