How can I optimize the number of ensembles when applying a bagging procedure with any given supervised learning algorithm?

More Kasper Christensen's questions See All

Seeking Advice on Viability and Execution of Undergraduate Thesis Topic?

Hello everyone, I am currently developing a thesis proposal and would appreciate your input on its viability and how to effectively carry it out. My proposed topic is: "Does the perceived threat...

10 August 2024 8,992 0 View

To perform transfection with DharmaFECT Duo in AGS cells. Could you tell me what the ideal concentration is to avoid significant cytotoxicity?

I would like to perform transfection with the reagent DharmaFECT Duo (Horizon) on the AGS cell line. Could you please inform me of the optimal concentration to use without causing cytotoxicity in...

03 August 2024 3,851 1 View

What is meant by baseline of FTIR data?

I got comment on my FTIR data figure from a reviewer. The reviewer said "FTIR data in Figure should be repeated. there is no bassline." I made Y off set comparison graph of FTIR on OriginLab. Can...

03 August 2024 6,070 3 View

What is Random Audit?

HI there, I've came across several articles discuss about random audit an Non random to tax evasion or compliance. Most of the articles is relating about effect of audit (random or non random)...

31 July 2024 5,309 7 View

Is the mentioned CV graph a valid one as this graph have only one peak prominent (reduction)?

I have used Prussian blue nanoparticles as a redox couple. The PBNPs have been made using only one salt precursor. Also, during scan rate studies, a small oxidation peak can be consistently found...

31 July 2024 9,697 0 View

How we can use lattice-based cryptography for construction of S boxes?

Please suggest basic literature on lattice-based cryptography. A kind response from the cryptographic community will be highly appreciated.

24 July 2024 6,291 4 View

I am working on III-V based tandem solar cells.Can anyone explain that solar cells work under forward or reverse biased conditions?

I want to know that n-doped side of solar cell is connected with positive or negative electrode.

20 July 2024 1,348 2 View

Can it is possible to find the cleaved sequence when a protein cleaved by a heamaglutanin protease (HA/P) by any bioinformatics tools?

Bioinformatics tools like peptide cutter

15 July 2024 6,453 1 View

Hello, does anyone know how I can get coordinates of two specific points on a moving body in Abaqus when applying a vdload subroutine?

Hello I am trying to simulate repulsive force of two permanent magnets in Abaqus. I have implemented Force equations into a vdload subroutine. sensitive parameters are the center to center...

15 July 2024 3,491 0 View

FACS markers to characterize rat bone marrow-derived macrophages?

Dear all, I am looking for FACS markers to characterize my rat bone marrow-derived macrophages (M0). They were differentiated from monocytes with M-CSF. I am considering to use CD11b/c and...

15 July 2024 4,615 1 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Need help with my research project on open source SIEM and machine learning?

Hello everyone, I am currently working on a research project that aims to integrate machine learning techniques into an open source SIEM tool to automate the creation of security use cases from...

04 August 2024 3,196 2 View

Swimming/space travel depends on the proprioceptive muscle spindles?

When the entire neocortex is ablated in rodents, although they are still able to swim, all the limbs move continuously and asynchronously (Vanderwolf 2006; Vanderwolf et al. 1978). Normal animals...

03 August 2024 835 3 View

What are the limitations and challenges of using machine learning for predicting concrete compressive strength in practical applications?

Machine learning (ML) has shown great potential in predicting the compressive strength of concrete, an important property for structural engineering. However, its practical application comes with...

03 August 2024 2,546 2 View

Junghyun Kwon

You can do cross validation to determine the optimal number of ensembles for the given problem. For the imbalance problem, random under sampling of negative samples would work fairly well.

Andreu Sancho-Asensio

As said by Kwon, using a cross-validation technique (e.g., stratified 10-fold CV) seems the best one can do in such a situation. For the skew in class distribution, I suggest applying an over-sampling method (i.e., adding new synthetic cases instead of eliminating real ones) via the Synthetic Minority Over-sampling Technique (SMOTE), which was introduced by Chawla et al. (2002).

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer. SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16:321–357, 2002.

Kasper Christensen

Hi Andreau and Junghyun. Thank for your answer!

I have done a lot of trials with oversampling vs. undersampling and I also ran into Chawla et al. I did however not implement the approach in my code. I will have a second look at it. I am doing text mining with alot of variables so I need to keep computation times in mind, which speaks for the undersampling solution.

The solution I have come up with for now is to under sample the majority class, and apply the full set of positive cases on the undersampled negative cases. Example:

Training set size: 3000 cases

Positive: cases: 100

Negative cases: 2900

My procedure then does the following with replacement:

For each ensemble(

100 random negative cases with replacement vs. 100 positive cases

Predict on a validation set

)

Given this procedure, how can I then optimally decide on number of ensembles and also number of positive and negative cases that goes into each ensemble...?

Hope it makes sense...

Ghulam Mustafa

Your approach looks similar to:

X.-Y. Liu, J. Wu, and Z.-H. Zhou. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man and Cybernetics - Part B: Cybernetics, 2009, 39(2): 539-550.