Model selection for data mining techniques?

More Talayeh Razzaghi's questions See All

Do you know any UV transparent tubing?

I am looking for a type of tubing to use for UV-crosslinking of hydrogel droplets passing from them. Currently, we use TYGON tubing that is fairly UV resistant. Please let me know if you know of...

01 December 2022 8,157 2 View

How can one calculate the hieght of a thickener?

A continuous thickener is a special clarify-er equipment consisting a round tank and paddle at the bottom. The paddle rotates with low speed (exp: 0.2 rpm). There are some important parameters...

21 August 2022 4,044 0 View

GC/MS analysis of mentha spicata?

I have sent a sample of mint essential oil for GC/MS analysis. in the analysis there is a component with this formula: 2-Cyclohexen-1-one, 2-methyl-5-(1-methylethyl)-, (S)- Do you please have an I...

30 July 2016 1,133 4 View

How can I seperate copper from nickel?

I have a 30%Ni-70%Cu alloy. how can I seperate copper and nickel economically?

25 July 2015 1,200 6 View

What is the best procedure for determining vegetative compatibility groups (VCGs) in Aspergillus flavus?

VCGs are important in life cycle of Aspergillus species especially those with no known sexual reproduction.

24 July 2014 6,648 1 View

What is the gas (liquid)/liquid displacement method to determine the membrane pore size?

My article's reviewer wants me to determine the pore size of the membrane by gas (liquid)/liquid method, but I don't know this method. What is it? How does it work? If I don't determine the pore...

21 April 2014 4,670 2 View

What is the best and most easy way to characterize the pore size and pore size distribution of a polymeric membrane?

My membrane material is of PVDF, and its category is flat sheet.

12 February 2013 1,716 10 View

Do you know if we should remove the outliers from the study after finding out they are outliers?

Do you know that should we remove the outliers from the study after finding out they are outliers?

23 October 2012 1,711 6 View

Does an essential oil influence the taste of the food products along with odors?

for example orange peel essential oil in cake.

01 January 1970 7,747 7 View

Umbrella sampling in Gromacs 5

Dear all, I have a question regarding Umbrella Sampling in Gromacs 5 and would be thankful if anybody could help me. Despite achieving PMF and position plots (pullf.xvg and pullx.xvg files),...

01 January 1970 2,445 6 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

Who will be moral responsible for the death of thousands of people in the event of an earthquake?

Who will bear moral responsibility for the deaths of thousands of people in the event of an earthquake? Weeks and months remain before the onset of strong earthquakes that bring death to...

08 August 2024 6,134 12 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

Ananda Kumar

Using feature selection method

Juan Luis Herrera Cortijo

A common rule of thumb is to reserve about 20% of your data for model selection (and 60% for training and 20% for testing).

For imbalanced data you should try to keep the proportions among classes on your training, model selection and test sets. One way to go is to sample at random the corresponding proportion on each class.

For example, say that you have 1000 samples and that you have two classes. One of them, class 1, accounts just for 10% of your data. Then you may want to sample to build your model selection set as follows:

20 samples at random from class 1 (20% of 100)

180 samples at random from class 2 (20% of 900)

You can also use other more advanced approaches like k-fold cross-validation or bootstrap. But the key point is to make sure that the proportions of your classes are kept in the training, validation and test sets. Otherwise you could end training on samples of just one class, which lead to poor generalization.

Bests

Mostafa EL HABIB DAHO

You can devide the data set to 80% for learning and 20% for testing. For the learning subset, you can try the Bootstraping samples, where you use the elements "out of bag" for the ensemble selection.

Arturo Geigel

I would suggest:

Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. In Data mining and knowledge discovery handbook (pp. 853-867). Springer US.

Also take note that in:

Japkowicz, N. (2000, June). The class imbalance problem: Significance and strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (ICAI’2000) (Vol. 1, pp. 111-117).

concluded that " standard multilayer perceptron is not sensitive to the class imbalance problem when applied to linearly separable domains". So the classifier used also has an impact on the sensitivity to class imbalance

Supratip Ghose

These sampling techniques may help....http://machinelearning.org/proceedings/icml2007/papers/62.pdf