Dear all, I'm hoping for a little bit of guidance.

I'm using RDKit and sklearn to implement a random forest model trained using our drug screening (1600 drugs) results to predict new drug responses.

I'm familiar with the Python RDKit code to generate different fingerprints and the sklearn code to implement the model.

However, I would appreciate any links to published works or technical papers that clearly demonstrate how to divide the dataset (for this particular problem i.e., some_Y_response ~ X_fingerprint) up into training, testing and validation sets.

I've gone through a lot of publications, typically coming from journals such as the Journal of Cheminformatics, and I'm quite shocked by the amount of missing information concerning the training of regression/classifier models.

Many thanks

Anthony

More Anthony Nash's questions See All
Similar questions and discussions