What is the best way to make dataset for subcellular localization of protein sequences?

11 October 2018 1 9K Report

I am trying to make dataset in order to predict subcellular localization of protein sequences. I have been downloading sequences from UniProt and ran weka, but accuracy is constantly appearing around 30-40%. Is there any way to improve it by making dataset accordingly?

Rajnish Kumar

It is suggestive to write a program (may be in perl) which can identify the subcellular localization Information line (by identifying the keyword) from the uniprot files and export it to a separate file. By this way you can collect the desired information accurately.

Badges
Science topic

More Hiteshree Buch's questions See All

How to label the amino acids involved in the protein protein interaction using PyMol software, as shown in expected labels.png?

I want to show the protein-protein interaction as shown in the "expected labels.png" using PyMol software, but unable to do it. Instead, I ended up getting the image shown as "pymol label.png". I...

29 August 2020 1,224 8 View

What should be the problem here in the attached screenshot of python shell from Autodock?

I am performing protein-protein docking using Autodock tools (following https://www.researchgate.net/post/Problem_in_using_Cygwin_commands_for_running_Autodock). I have tried by adding Kollman...

09 August 2020 6,769 4 View

What does this error mean, does the problem reside in protein itself or there is something else to worry about?

I have been trying to convert pdb files of proteins into pdbqt format using Raccoon program from MGLTools-1.5.6. Some of the proteins' files have been successfully converted into pdbqt files but...

23 April 2020 5,205 1 View

I am getting an error while running "make" command in AutoDock4 in ubuntu (virtual box), can anyone have a solution for it?

http://mgldev.scripps.edu/pipermail/autodock/2009-February/005150.html I am following the above link because I also got the error in .dlg file of autodock that MAX_RECORDS of constants.h file has...

15 April 2020 3,100 2 View

How to increase the MAX_RECORDS in constants.h file of autodock 4.0.1 using Windows 10 ?

I am performing protein-protein interaction using AutoDock 1.5.6, but in .dlg file, it throws an error "ERROR: 2328 records read in, but only dimensioned for 2048. Change "MAX_RECORDS" in...

05 April 2020 3,880 6 View

How to iterate a dataframe in a way that each row will be read and test against the svm model and in another file, give predictions of each row?

I am calculating amino acid composition of a fasta file containing 250 protein sequences. the calculated amino acid composition is being saved in a csv file which was read as a dataframe. Bur,...

16 July 2019 2,760 0 View

Why do the best performance of svm tuned model and the prediction accuracy of the same model differ from each other?

I tuned an svm model by giving ranges of gamma and cost's values, from which best performance was obtained as 83.8%. Now, I want to do prediction using the same best model, in order to do so I...

14 April 2019 3,596 0 View

I want to make multiple ROC curves in a single plot using PRROC library. What type of input will go for the same?

I have made a model using tune.svm function of 'e1071' library from R. I tried many times to use 'plot' function of same library but didn't succeeded. Now, if I want to make an ROC plot containing...

01 April 2019 3,817 1 View

Error in UseMethod("predict"): no applicable method for 'predict' applied to an object of class "tune". How to overcome this error?

I have tuned my svm model by setting ranges of 5-5 cost and gamma values using 'e1071' package of r. But the problem comes when I use this model for prediction. The above error has popped up on...

14 March 2019 7,771 3 View

Can anyone suggest which software to be used to calculate physico-chemical properties of protein sequences?

I want to calculate physico-chemical properties of protein sequences. can anyone suggest which are the software available out there to do the same.

11 November 2018 2,975 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to confirm the site-directed mutagenesis result without performing NGS?

I'm cloning a fragment of 3200 nts into plasmid. The cloning was successful, however, 02 amino acids were mutated. Now I want to fix these 02 aa by site-directed mutagenesis technique using...

08 August 2024 4,645 2 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View