I am currently working on identifying a type of bioactive peptide for which there is no database and I found them by reading articles. The number of peptides that possess my intended characteristic and have been published in valid articles, is about 40 to 50. So the number of positive data is very small. Since we do not have negative reports in biology, I do not know how to find or create negative data.
So two questions are posed:
1. How to build a reliable statistical or machine learning model to identify bioactive peptides using this small set of positive samples?
2. How to increase the number of positive data and find or create negative data such that the model that is made based on these data will be correct and reliable?
I will appreciate it if you provide me with some advice in these regards.