How do I use curl (linux terminal tool) to retrieve multiple protein data bank files (PDB) as separate files ?

15 September 2016 5 900 Report

Hello,

My question may be simple. But I think others could benefit from having this skill, and I'm not quite sure how to separate these files once I get them. So I was looking through a molecular dynamics paper and I noticed that the authors used a lot of PDB files so I wanted to take those files and retrieve them with curl (c uniform resource locater).

If I just call this command then in my terminal I can see all the PDB files from the protein data bank that I am interested in, where the curly braces specify different files in the basic file extension.

$ sudo curl "http://files.rcsb.org/view/{1PPE,1AVW,1BRC,1CGI,1TGS,1TAB,2PTC,2SIC,1DFJ,2SNI,1UGH,1CHO,1

ACB,2TEC,4HTC,1CSE,1MAH,1FSS,1BRS,1DFJ}.pdb"

If I create an empty file named "enzyme_inhibitor_complexes, and redirect the output sequentially with " >> " (instead of overwriting the file multiple times with " > " which would not be good!) so that 1PPE is dumped into enzyme_inhibitor_complexes, then 1AVW, and finally 1DFJ, then I can dump all the information from the pdb files into the enzyme_inhibitor_complexes file one after another.

$ curl "http://files.rcsb.org/view/{1PPE,1AVW,1BRC,1CGI,1TGS,1TAB,2PTC,2SIC,1DFJ,2SNI,1UGH,1CHO,1

ACB,2TEC,4HTC,1CSE,1MAH,1FSS,1BRS,1DFJ}.pdb" >> enzyme_inhibitor_complexes

But now I want to separate the files into their own pdb files, such as 1PPE.pdb, 1AVW.pdb, 1BRC.pdb, and so forth. I would rather not have all the files in one big file. I felt that I had to do that compromise to collect all the pdb information.

For example, if I just wanted to download and save one of the files, then that would be easy.

$ curl "http://files.rcsb.org/view/1CGI.pdb" -o 1CGI.pdb,

where "-o" designates the output file.

It seems like it would be useful to use a scripting perl or python "for loop", but I don't know how I would make one for this "curl" application.

Obviously, it's more tedious to learn how to do this than downloading all the files individually from my web browser, but if I can learn how to do this, then I would save a lot of time in the long run as I work with pdb files in my bioinformatics career.

Thank you! Your help is much appreciated!

- Adron

Justin Lemkul Popular answer

This is simple to do in a bash for-loop, using wget.

for id in 1PPE 1AVW 1BRC (etc)

wget https://files.rcsb.org/download/57db3a6b48954c87d9786897.pdb

done

No need for post processing, you get all the files separately.

Justin Lemkul

This is simple to do in a bash for-loop, using wget.

for id in 1PPE 1AVW 1BRC (etc)

wget https://files.rcsb.org/download/57db3a6b48954c87d9786897.pdb

done

No need for post processing, you get all the files separately.

Adron Ung

Justin Lemkul,

Thank you! Your script works like a charm. I replaced wget in the for-loop with

# curl "http://files.rcsb.org/view/57dc333dcbd5c272f255d998.pdb" -o 57dc333dcbd5c272f255d998.pdb

And I got 20 pdb files. Works like a charm!

Thanks!

Adron Ung

James "Wes" Barnett, thank you for pointing that out.

Thankfully, the way I employed "curl" worked as I have checked the PDB files that I downloaded and they are all different. But, I may try that little modification, too.

Rogelio Rodríguez-Sotres

download the attached file

make it executable

chmod 755 pdbdwn.bash

it your bash shell does not reside in /bin/bash

edit the first line accordingly

call it as:

pdbdwn.bash 1UDE 5ksz

you may use upper or lowercase characters, the file will be found, and it will be saved with the casing as you typed

i.e. if you give 5KSZ your file will be in 5KSZ.pdb.gz, but is you typed 5kSz your file will be 5kSz.pdb.gz

best wishes

rogelio

Are there any structures ever of any DNA polymerase in strand displacement synthesis?

If an active site mutant can't produce product but binds substrate is kcat zero?

Do you have to do PBS perfusion in mouse before brain collection for pharmacokinetic studies?

What's a child labor differed from child right and child protection?

How to test a SLM ?

Is there a way to tell based on appearance which protein crystal should diffract the best?

What would be the optimum gel concentration to resolve dsDNA fragments from 10 to 16 base pairs in size?

What functional group can be added to a hydrophobic molecule to make it water soluble at basic pH?

What's this white haze over this protein crystallization droplet?

Additional Questions about the dissociation constant, thank you?

Can I use a HisTRAP column for affinity chromatography?

Are there instances where molecules with larger molecular weights exhibit greater mobility than those with smaller molecular weights?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

For an in-vitro drug release study, what molecular weight cut-off (MWCO) dialysis bag is required for a 117 kDa protein?

How to start a Molecular Dynamics Simulation?

Which will be the best software for the Hydration shell analysis with molecular dynamics?

Can anyone provide me with molecular docking softwares/ websites?

What is the best blank for nanodrop if I want to read a recombinant protein concentration?

Seeking Software Recommendations for SELEX NGS Data Analysis?

Can i use the protease inhibitors during cell membrane vesicle preparation ？?