Is mtry=1 acceptable for a randomForest model based on 23 predictors?

More Marco Bolis's questions See All

40 Cell lines gene signature: RNA-Seq VS Microarray - Feature Selection with RandomForest. Can anyone lend their expertise?

I have 40 cancer cell lines that were treated with a specific drug, and drug response was quantified with an appropriate numeric score ranging form 0 to 1. I have gene expression data (untreated)...

03 April 2014 7,121 2 View

Are there some databases / portals that are specifically aimed to collect protein array data?

I could find some experiments searching for "Proteomic profiling by array" in GEO and array-express, but I'm wondering if there is more outside or if that is all for now?

02 March 2014 4,229 7 View

Does anyone have any advice on running Tophat on a cluster with multiple nodes?

I have access to a computer cluster made of 44 nodes. Each node has 12 cores and 48 GB ram. The main problem is that jobs have a maximum walltime of 6h then they get killed, but I can use as many...

02 March 2014 7,378 7 View

RNASeqV1, RNASeqV2, AlexaSeq could you point out pros/cons?

I'm working on an RNASeq batch of data and I'm interested in discriminating gene isoforms. I performed the analysis using the classic tuxedo suite bowtie+tophat+cufflinks. I've seen many papers...

02 March 2014 8,839 5 View

Does anyone have experience with Multi-Omics Interaction Network (GE + miRNA + Methylation + CNA)?

I determined the sensitivity of 40 breast cancer cell lines to a previously untested chemical compound. For each of the cell lines (before treatment) I have multiple Omics data: Gene Expression,...

02 March 2014 423 9 View

Are p-values computed from variable importances (randomForest) usable for meta-analysis?

I have several gene-expression datasets and would like to perform a meta-analysis by combining p-values using Fisher's method. I used to perform this using p-values derived from differential...

09 October 2013 2,516 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

Hello researchers Is this a random laser or just fluorescence?

I am using Rhodamine6G as gain medium and silver nanoparticles as scatterers on a microscope slide and laser input 532 nm comes from above.

09 August 2024 9,894 2 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

Weak DAPI staining after immunohistochemistry - how to improve?

After immunohistochemistry of previously fixed in PFA and EtOH and then frozen 20 μm sections of zebrafish brain, DAPI staining is very weak (right) compared to the same sections stained without...

05 August 2024 9,637 2 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

I aim to be as skeptical as possible regarding whether a pair of orthologous genes results in the same phenotype in their different but related bacterial organisms under similar environmental...

05 August 2024 6,787 4 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended?

In the case of a wound l recurrence after radical breast cancer and sentinel lymph node biopsy. Are the sentinel lymph node procedure recommended? If no axillary lymph node dissection was not...

05 August 2024 8,056 1 View

Berlinda Verdoodt

A basic comment comment concerning model fitting: on any one dataset it is possible to build a model that fits the dataset perfectly, but as this dataset is just some (random, hopefully) sample of the total population, this "perfect" model likely does not fit any other same at all well. I suggest that with a large dataset, you take a sample (say, 50% of datapoints, sampled randomly), use this for building the model, then test it on the rest of the dataset. If this still gives a good fit, you probably have a usable model.

Iordan Slavov

Why don't you compare to a "static" tree (e.g. use rpart() from package "rpart") using the 23 variables and see if result for mtry=1 holds? In that case may be you did a very good preselection... or all 23 vars are highly correlated (?!)

Marco Bolis

Hello, thank you for your replies and suggestions.

As you said, I realized that most of the variables are part of common pathways and are coexpresed and heavily interconnected one to another. This is likely why creating decision trees from this small subset is no more informative than looking at the single gene information.