I have a long list of chemicals of interest, but not all the chemicals have IC50 values. I have performed docking but now want to go for QSAR. Can anybody suggest something?
If you are looking for simple approach - then use only chemicals with known IC50. If you are ready for hard times - try some semisupervised machine learning.
As Mr Oleg said, You can extract many descriptors for each of those chemicals for which IC50 value is known, then fit a regression model and use that model to predict IC50 Values for the unknown one. To make this work you should have some statistically significant chemicals with known IC50 Values.
Do you want to check that molecules of your interest whether they are active or inactive. then in that case you have to first get a training data for your activity prediction, say for e.g. if want to predict IC50 values of anti- viral compounds.Then i have to first collect a training data that will have wet lab proven antiviral with there IC50 values. Then we have to divided the data in training and test set. and the descriptor play very imp role in QSAR . If you don't know which descriptor to calculate then calculate all descriptor and plot a graph between descriptor and IC50 values if it show a straight line then those are making significant contribution in determining the IC50 values. If you don't have IC50 values of all data, do u have data like active and inactive you can use that also as prediction point if you want more information mail me at [email protected]
1.. is it necessary to collect IC50 of compounds having same synthesis protocol??
or
if compound are from natural origin than extraction methods may differ.... in that case what shd i do?
2.. I am having IC50 for few chemicals but their activity is different. (like few have antibacterial few have anticancer property) i want to evaluate it for antibacterial activity can i use all IC50 without considering their individual property?