How should I approach biological programming loop to calculate the distance between atoms in two residues across several pdb files?

07 December 2016 2 1K Report

Hello, so I have several Protein Data Bank (.pdb) files that have been parsed to only include two residues (Tryptophan with Glutamate, and Tryptophan with Aspartate) in a protein.

I want to calculate the distance between them using the CDelta atom of Tryptophan and the OD1 and OD2 atom of Aspartate; but in the case of Tryptophan interacting with Glutamate, it's CD1 of Tryptophan and OE1 and OE2 of Glutamate.

I'm thinking in terms of perl. If you prefer another language, then I will still try to reconstruct the logic into perl because perl makes sense to me.

Since all the files end in .pdb, I can just force all the files with that extension into an array. All of the files within the name contain information about what number the residues are. For example, I have a file named ...

5D6EA_TRP-166-A_GLU-163-A.pdb ; therefore, within the name of the title is the number of the tryptophan residue (# 166) and the number of the glutamate residue (#163) so I don't need to invade the files to get the residues involved, although I could do that.

I was thinking about performing the distance calculation in pymol since it has commands to calculate the distances between two atoms. Pymol needs four pieces of information the residue number identifiers (which change in all the files) so I should put those in an array, the atoms names (CD1, OD1, OD2, OE1, OE2) are continuous constant so I could make scalars of those.

My plan of action for Tryptophan interacting with glutamate is this:

1. source pymol with a system call to the terminal

2. Put all the files with the extension .pdb into an array. Then make scalars OD1, OE1, OE2

3. Begin a foreach loop

4. Within the loop open and read all the files in the array.

5. Extract the residue number number of Tryptophan, perhaps using grep,

and make a scalar of it.

6. Extract the residue number of Glutamate perhaps using grep and make a scalar of it.

7. Open pymol's terminal with a system call.

8. Plug the scalars of the residue numbers into the get_distance algorithm of pymol replacing the residue numbers with the extracted scalars, and plug the static atom names with the scalars that I wrote before the foreach loop. (But I may not have to use scalars for the latter)

9. Close the system call. Redirect without overwrite (>>) the computed distance the end of a long text file.

10. close the file.

11. close the foreach loop.

In reality, I should add some echo commands to notify me when which distance will be computed for which residues. I think that by using a foreach loop the program would open read analyze, direct the system call to pymol to do a calculation of get_distance, and close the file, redirect to output ... one file at a time. That's how I think it would work.

I will give you two of my files so you know what the raw data looks like.

How would you approach this problem?

-------

So so far, I've got a perl script to open all the files in the directories, print off the names of the files it finds in the directory (after my script is more complete, I may delete this), open each file one at a time and print off all the lines of the files in a while loop. Now I'm trying to figure out how to identify if the words ASP or GLU are found in the files. If you could help me with that, then I could figure out how to write a long if-else loop which would deal with the get_distance pymol algorithm labeling oxygens accordingly depending upon if the residue is ASP or GLU.

Here is my script so far. (It's extension is actually .pl for perl. But research gate doesn't support that extension so I just renamed it to txt.)

Martin Klvana

It can be done without PyMOL: distance = (dx**2 + dy**2 + dz**2)**0.5

---

# Python . . .

1. Get all pdb files: pdbs = glob.glob("*.pdb")

2. Put data into dictionary: D = {} . . . for pdb in pdbs . . . for line in open(pdb) . . . D[pdb] = {: {"atom_name": , "residue_name": , "x": float(), "y": float(), "z": float(}}, get the , , , , , data using line.split().

3. Calculate the distances TRP-CD1...ASP-(OD1/OD2) and TRP-CD1...GLU-(OE1/OE2): "for pdb in D . . . for residue_identifier in D[pdb] . . . if D[pdb][residue_identifier]["resname"] == "TRP" . . . resid1 = residue_identifier . . . if D[pdb][resid1]["atom_name"] == "CD1" . . . x1 = D[pdb][resid1r]["x"], y1..., z1..., . . . for residue_identifier in PDB . . . if D[pdb][residue_identifier]["residue_name"] == "ASP" . . . resid2 = residue_identifier . . . if D[pdb][resid2]["atom_name"] in ["OD1", "OD2"] . . . x2, y2, z2 . . . distance = ((x2 - x1)**2 + (y2 - y1)**2 + (z2 - z1)**2)**0.5 . . . (print(resid1, resid2, distance)) . . . the same for all "GLU" . . .

Adron Ung

Thank you very much!

I don't understand python, but I think I get your method. Thanks especially for pointing out that I don't need to open pymol to do the calculation. I understand that formula.

Are there any structures ever of any DNA polymerase in strand displacement synthesis?

If an active site mutant can't produce product but binds substrate is kcat zero?

Do you have to do PBS perfusion in mouse before brain collection for pharmacokinetic studies?

What's a child labor differed from child right and child protection?

How to test a SLM ?

Is there a way to tell based on appearance which protein crystal should diffract the best?

What would be the optimum gel concentration to resolve dsDNA fragments from 10 to 16 base pairs in size?

What functional group can be added to a hydrophobic molecule to make it water soluble at basic pH?

What's this white haze over this protein crystallization droplet?

Additional Questions about the dissociation constant, thank you?

Are there instances where molecules with larger molecular weights exhibit greater mobility than those with smaller molecular weights?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

For an in-vitro drug release study, what molecular weight cut-off (MWCO) dialysis bag is required for a 117 kDa protein?

How to start a Molecular Dynamics Simulation?

Which will be the best software for the Hydration shell analysis with molecular dynamics?

Can anyone provide me with molecular docking softwares/ websites?

Can we patent a process flow diagram developed using a process simulator but no actual cases is carried out?

Seeking Software Recommendations for SELEX NGS Data Analysis?

CAD File of human's & rat's respiratory airways ?

How to restart MD without using checkpint file in GROMACS?