How to divide up a drug-response (Y) ~ chemical fingerprint (X) dataset to train a random forest model?

More Anthony Nash's questions See All

Articles on" Gender disparities i leatherwork education"?

Articles on" Gender disparities i leatherwork education"

07 August 2024 2,500 0 View

Absorption coefficient of methane?

Hello, Can anyone provide me with the absorption coefficient of methane gas at 7.7 um? Any reference?

06 August 2024 980 5 View

How are Large Models Exploring and Outputting Knowledge Understanding in Specific Content Areas, and What Does Academic Research Say About It?

Hello everyone！ I am currently exploring the performance of large models in understanding knowledge in specific domains, and attempting to construct a knowledge framework similar to what...

05 August 2024 5,729 2 View

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity?

Regarding a model for simulating battery charge and discharge, what do you consider to be high fidelity? What is the acceptable percentage of error (regardless of the metric)? Could you suggest...

03 August 2024 5,358 0 View

How do i get an account to upload my published papers?

need to open an account to upload my published papers

01 August 2024 9,255 1 View

What is the problem with these tissue culture plants?

All plants are green but some of these plants becomes yellow. I did not found any reason. Please help me to find out the real problem.

01 August 2024 589 4 View

How to correctly use the UTE and ZTE pulse sequences in Bruker's ParaVision software?

I am using a Bruker 600M solid-state NMR spectrometer with a Micro 2.5 microimaging system. The test sample is a tube of 1M LiCl aqueous solution, and the nucleus detected is 1H. I am trying to...

01 August 2024 9,227 1 View

Is artifacts in XPS possible to build high deviation in binding energy larger than 5 eV??

Hello. Thanks for your consideration to see my question. Recently, I conducted XPS anaylsis of g-CN that is prepared from thermal polycondensation of DCDA, so-called conventional bulk-g-CN,...

30 July 2024 9,824 2 View

Which statistical test should we use?

N=6 Comparing pre and post test likert scale responses. Participants are mix of practicing & preservice teachers.

30 July 2024 7,233 4 View

How to build my own lab made four point probe set up?

Hello, I'm trying to measure the conductivity of semiconductor films but since I don't have a commercial four point probe set up I would like to build one on my own in my lab. I have generators,...

30 July 2024 906 2 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

How to calculate CCS for Sodiated adduct ions and Multiply Charged Ions?

I'm currently working on calculating the collision cross section (CCS) for various ions, and I'm facing challenges when dealing with sodiated and multiply charged ions. Most of the resources I’ve...

08 August 2024 8,329 0 View

Qamar Ul Islam

Dear Anthony Nash

These articles might be an asset, have a look:

1. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0219774

2. https://www.nature.com/articles/s41698-020-0122-1

3. https://pubs.acs.org/doi/10.1021/acs.jcim.9b00236

4. https://www.sciencedirect.com/science/article/pii/S088875431830466X

5. https://jcheminf.biomedcentral.com/articles/10.1186/s13321-018-0304-9

6. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6622537/

Kind Regards

Matthias Heger

In its most basic form, it's as simple as making a (let's say) 80:10:10% random split of your dataset for training, testing and validation.

If your molecules can be uniquely divided into some specific classes or categories, you might have to look into "stratified sampling" to draw your samples.

Of course, If you work with a single 80:10:10% TTV split, then you lose 20% of your precious data that could be used to train the model. You can salvage this by doing "k-fold cross validation", where you divide your data set into k blocks of equal size and then retrain your model multiple times, each time using different blocks for training and testing.

This post should act as a good entry point:

https://www.mygreatlearning.com/blog/cross-validation

When it comes to sklearn, the model_selection module has the functions you need:

https://scikit-learn.org/stable/modules/classes.html#module-sklearn.model_selection