I have quantitative proteomic data from 2 groups: control and infection. How do I get a list of proteins that were identified in the infection group but were not identified in the control group? I use MaxQuant and Perseus for the analysis. Thank you
Supposing your experiment is based on label free quantification and no replication.
You can log2 transform the LFQ intensities and then get:
1) log2 fold change by substracting the values of one column from the other
2) average log2 LFQ intensity by summing up the LFQ intensities and dividing the sum by 2
Then you can proceed to calculating significance B. You can try to use the p value cutoff or the false discovery controled cutoff. The later is more conservative though.
You can do the workflow above based on the values as they are or by filtering the transformed LFQ values for at least 1 valid value and performing imputation with the standard parameters.
How many repeats do you have for your two groups and how many identified proteins do you have overall and those that appear in each of the groups?
It is important to make sure you have eliminated any proteins that are identified with less than 99% confidence and to ensure your false discovery rate is set to 1% to avoid too many possible incorrectly found proteins and to validate the protein data.
The is a really good and easy to follow tutorial on how to input the data and how to extract Venn diagrams and heat maps of your data at the following link http://lnbio.cnpem.br/wp-content/uploads/2012/11/Tutorial-Perseus_02062015_release_v1.pdf
These are SILAC-labelled proteins. I have at least 3 replicates of each group (control and infected). I run each group separately on MaxQuant and identified around 2000 proteins in control group, and 2400 in infected group. If I want to know what proteins appeared in infected group but not in the control, is there any tool that I can use. I suppose looking at them manually by eye would be daunting.
I also run both groups together on MaxQuant using the experimental design file and identified around 2500 proteins. I assumed that these are the proteins that appears in both group.
As these are quantitative data, lots of the proteins don't have L/H ratio. How do I impute the data? Is it by any random number?
Regarding step 2) average log2 LFQ intensity by summing up the LFQ intensities and dividing the sum by 2, what exactly do you mean? log2[(LFQ1+LFQ2)/2] OR (log2LFQ1+log2LFQ2)/2? Thank you so much!