Can molecules extracted from ChEMBL database or Pubchem database using similarity searching methods be used as decoy in machine learning classification studies?
Are you using the structures for classifying a particular set of descriptors or is it just based on common pharmacophore backbone. Because if it s for the former then you need to pool structures having similar structures; however, for the latter the entire choice is biased hence you can take random samples.