I have a list of about 50 inhibitors of T. brucei protein with corresponding IC50 values and I am looking for methods and databases to construct a decoy set against known inhibitors for classification studies.
Hello, a number of decoy sets are available in the public domain. They have been primarily developed for virtual screening enrichment studies (for an example see Directory of Useful Decoys http://dud.docking.org/ or decoys for docking (see attached reference)).
There are also bioinformatic tools to build your own set of decoys. For example:
I think DUD won't give what you need, try DUD-E (Enhanced Directory of Useful Decoys), with DUD-E you can create your own decoys simply by providing a set of of known inhibitor: http://dude.docking.org/
But you have to cross check the generated decoys, there is a chance that the decoy is actually an inhibitor too, therefore you have to check every decoys generated just to make sure.
Update (Feb 24, 2015):
Just checked DecoyFinder after the tool is mentioned again by Vikash. And I think in some way it is similar to DUD-E, where it has the feature to throw away the decoy that very similar to the one already selected. The only thing that bother me is the fact that the software never been benchmarked (it has been used for several studies though). Still, it is worth to try this tool.
As per my understanding, DecoyFinder tool will merely select decoys from the dataset uploaded for screening based on calculated properties from given set of inhibitors but which small molecules databases should be preferred for screening like DUD-E uses ZINC database to search for decoy molecules. Can we consider any database for finding decoys or some databases are preferred over others?
Well, I think the cool thing about ZINC is it is a free database (not a proprietary one and free of charge too), contain so many molecule that commercially available, and well annotated. I don't know if there are any other database that you can mine the decoy from. The biggest chemical (and well annotated) database I've ever dealt with was DrugBank (http://www.drugbank.ca/downloads), but I doubt if it will suit your need.
And as far as I know, the decoys were used for retrospective validation. So it makes me curious, what is it that you're looking for if I may ask? Could you please elaborate this classification study of yours?
I guess that if you are looking for inhibitors I would try to derive decoys from a similar set of drug-like compounds falling in the same molecular property space (e.g. MW, clogp, HbDon/Acc count etc). I would also perhaps consider evaluating a small set of known literature compounds (if available) as positive control. Would look in ChEMBL database as first port of call.
I just checked again the old paper about virtual screening of fragment-like antihistamine H1 by de Graaf et. al. in 2011 (http://pubs.acs.org/doi/abs/10.1021/jm2011589) which produces such an excellent result. And they were using Bioinfo: