Hello, so I have several Protein Data Bank (.pdb) files that have been parsed to only include two residues (Tryptophan with Glutamate, and Tryptophan with Aspartate) in a protein.
I want to calculate the distance between them using the CDelta atom of Tryptophan and the OD1 and OD2 atom of Aspartate; but in the case of Tryptophan interacting with Glutamate, it's CD1 of Tryptophan and OE1 and OE2 of Glutamate.
I'm thinking in terms of perl. If you prefer another language, then I will still try to reconstruct the logic into perl because perl makes sense to me.
Since all the files end in .pdb, I can just force all the files with that extension into an array. All of the files within the name contain information about what number the residues are. For example, I have a file named ...
5D6EA_TRP-166-A_GLU-163-A.pdb ; therefore, within the name of the title is the number of the tryptophan residue (# 166) and the number of the glutamate residue (#163) so I don't need to invade the files to get the residues involved, although I could do that.
I was thinking about performing the distance calculation in pymol since it has commands to calculate the distances between two atoms. Pymol needs four pieces of information the residue number identifiers (which change in all the files) so I should put those in an array, the atoms names (CD1, OD1, OD2, OE1, OE2) are continuous constant so I could make scalars of those.
My plan of action for Tryptophan interacting with glutamate is this:
1. source pymol with a system call to the terminal
2. Put all the files with the extension .pdb into an array. Then make scalars OD1, OE1, OE2
3. Begin a foreach loop
4. Within the loop open and read all the files in the array.
5. Extract the residue number number of Tryptophan, perhaps using grep,
and make a scalar of it.
6. Extract the residue number of Glutamate perhaps using grep and make a scalar of it.
7. Open pymol's terminal with a system call.
8. Plug the scalars of the residue numbers into the get_distance algorithm of pymol replacing the residue numbers with the extracted scalars, and plug the static atom names with the scalars that I wrote before the foreach loop. (But I may not have to use scalars for the latter)
9. Close the system call. Redirect without overwrite (>>) the computed distance the end of a long text file.
10. close the file.
11. close the foreach loop.
In reality, I should add some echo commands to notify me when which distance will be computed for which residues. I think that by using a foreach loop the program would open read analyze, direct the system call to pymol to do a calculation of get_distance, and close the file, redirect to output ... one file at a time. That's how I think it would work.
I will give you two of my files so you know what the raw data looks like.
How would you approach this problem?
-------
So so far, I've got a perl script to open all the files in the directories, print off the names of the files it finds in the directory (after my script is more complete, I may delete this), open each file one at a time and print off all the lines of the files in a while loop. Now I'm trying to figure out how to identify if the words ASP or GLU are found in the files. If you could help me with that, then I could figure out how to write a long if-else loop which would deal with the get_distance pymol algorithm labeling oxygens accordingly depending upon if the residue is ASP or GLU.
Here is my script so far. (It's extension is actually .pl for perl. But research gate doesn't support that extension so I just renamed it to txt.)