How can I choose the validating set for doing evaluation in wrapper feature selection method?

More Wassila Guendouzi's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to determine the position of occupancy of the dopant? - whether it is doped in tetrahedral or octahedral site?

Suppose a material "A" has both tetrahedral and octahedral sites and we are doping another material "B" - usually an ion into it. How can we detect if the dopant has occupied the octahedral site...

17 July 2024 4,299 4 View

What are possible sample selection biases in Logit model estimation?

I randomly interviewed 250 poor people and 250 non-poor people. Considering 1 for poor and 0 otherwise, does estimating a logit model aiming to capture the probability of becoming poor make sense?...

02 June 2024 2,326 2 View

Why do I get no melt curve peak and no Ct for some of my patients but not others, in a SYBR green qPCR?

I am doing a RT-qPCR for gene expression analysis of cancer patients, and there are four different groups in the results. The first group that shows over expression of my target gene (both two...

31 May 2024 1,088 0 View

How to calculate the angular velocity of a rotating cell from a microbe video?

I have a video of Brownian motion of microbes under a microscope. From this video, I need to calculate the angular velocity of a particular prominent cell. From my understanding, what I know is...

18 May 2024 4,289 3 View

During variable selection to enter into the multivariable analysis, is that a must to run a bivariable analysis when the number of variables are < 10?

I want to select variables to enter into the multivariable analysis by doing a bivariable analysis but the number of variables are limited,so is there any ground rule in such condition?

13 May 2024 7,816 1 View

How can ERBTC address cultural beliefs and gender dynamics to reduce transfusion-transmissible infections and minimize blood discard rates?

Based on the study "Challenges facing Blood Transfusion Services at a Regional Blood Transfusion Center in Western Kenya" conducted by Kavulavu et al. (2022), blood donors at ERBTC are typically...

11 April 2024 8,213 0 View

How to choose variables in regression analysis?

I want to perform an analysis using Poisson/negative binomial regression. There are 90 observations and about 20 variables(predictors). I read somewhere that there should be at least 10...

09 April 2024 3,907 7 View

Sexual selection and speciation – unpublished data?

Dear Colleagues, We are conducting a meta-analysis to synthesize comparative studies investigating the relationship between proxies of sexual selection and species richness or speciation rate in...

07 April 2024 5,425 0 View

Are features that do not co-occur with the output variable (of a complementary distribution/(-)ve point-wise mutual information) deemed irrelevant?

The fact that a feature is of a complementary distribution does not seem to be a sufficient reason to discard the feature as irrelevant; especially as they seem phenomenologically relevant.

21 March 2024 8,437 2 View

With the current encoders, is it possible to reconstruct an image from its feature vector?

Given a feature vector is it possible to reconstruct the original face image? I read about the reconstruction of MNIST, but what about face images?

14 March 2024 8,434 3 View

Mohamad M. Awad

Why do not you try to change the percentage of the training, validation and testing sets to 60%, 20% and 20% and check if your error is decreased. Sometime the high error in the validation phase is due to over training.

Wassila Guendouzi

it is a data set (for example lonosphere from the UCI repository) that was devided randomly into three subsets so for us the difference is unkonw juste for the number of instances in each sub set and may be also the number of instances in each classes. in some work, authors take 70% for training , 20% for validating and 10% for testing. since this portion are fixed I think we can not interchange.

Ngo Ngoc-Tri

Dear Wassila Guendouzi,

I agree with Georgios Leontidis that "The best way is to either run a cross-validation and choose the K-fold depending on the amount of data" you can refer to the paper "

"Shear Strength Prediction in Reinforced Concrete Deep Beams Using Nature-Inspired Metaheuristic Support Vector Regression" and refer some useful cited reference from this paper to know "a cross-validation method".

Please contact me if you have further question.

Andrés Larroza

Hi Wassila,

The problem when you divide your data in training, validation and test sets is that you may have a "lucky" partition, so as other mentioned the best way is to perform a nested-cross-validation. With this approach you will have different partitions of training, validation and test sets and at the end you get the average performance which may be a more realistic one.

You can see an example in the following link, where they implement a svm-rfe feature selection algorithm. https://github.com/johncolby/SVM-RFE

Regards,

Andres

Grzegorz Dudek

Dear Wassila,

Use leave-one-out method (see https://www.researchgate.net/publication/221184961_Tournament_Searching_Method_to_Feature_Selection_Problem where I also used tournament searching as wrapper). Or use local version of leave-one-out: use only the most similar training samples to the test samples to leave in LOO.

Conference Paper Tournament Searching Method to Feature Selection Problem

Aleksandra Gruca

I would definitely advice you to use K-fold cross validation approach, as with single partition you may have "lucky" partition as Andrez already mentioned.

What you need to remember is that there are two approaches for wrapper feature selection methods.

First one is called Non Recursive Feature Elimination (NRFE) and in this approach the variable ranking is computed only once at the beginning of the learning process. Next, less important variables are removed from the ranking and the random forest is learned based on the remaining set of values. This step is repeated in several iterations until no further variables remain. Second approach is called Recursive Feature Elimination (RFE) and it differs from NRFE method is such a way that the importance ranking is updated (recomputed) at each iteration. Then, similarly to NRFE, the less important variables are removed and random forest is learned. There are some extensive simulation study was performed comparing these two approaches showing that RFE approach as it might be more reliable than NRFE since the ranking by the permutation importance measure is likely to change at each step and by recomputing the permutation importance measure we assure the ranking to be consistent with the current model.

The above is a citation from my paper which is just about to be published on the ICMMI Conference Proceedings (http://www.springer.com/us/book/9783319234366).

If your are interested in more details, or reading my paper, please contact with me. The paper will be also available on my RG profile as soon as it is published on Springer website.

Felipe Leno da Silva

You can use a cross-validation approach.

The performance with cross-Validation should be better because this algorithm's performance takes into account the situation where there is unseen instances, while using all the available data.

What type of cross-validation you should use depends on your data. Some popular variations are K-Fold Cross-Validation, Leave one out Cross-validation and Stratified Cross-Validation.

You can divide all your data in 2 subsets... one you use to perform cross-validation and find the best feature selection, and the other will be used to assess the final performance.