How implement sampling methods (for unbalanced data) in k fold cross-validation?

More Eghbal Rahimikia's questions See All

How can I set buy and sell price for calculating profit of my automatic stock trading system (out of sample profit calculation)?

'm testing my automatic trading system in stock market (data mining system). I'm modeling day by day for 30-days and calculate profit in every step. Suppose that my system predicts tomorrow close...

04 May 2016 5,255 6 View

Low performance of SVM (and neural network) in out-of-sample data with high test accuracy of 10-fold cross validation in a financial time series

I'm using SVM and (neural network) for a time series prediction data-set in MATLAB R2016a with 800 samples. Currently I'm using 10-fold cross validation and grid search to find best SVM...

04 May 2016 8,897 4 View

Unbalanced sensitivity and specificity with high total accuracy in a binary classification case

I'm using MATLAB R2016a for binary classification (time series prediction) of a financial case. I have a good total accuracy (70~75%) but specificity is about 90% and sensitivity is about 60% and...

04 May 2016 6,747 2 View

What are appropriate feature selection techniques for binary features?

Suppose that we have binary features (+1 and -1 or 0 and 1). We have some well-knows feature selection techniques like Information Gain, t-test, f-test, Symmetrical uncertainty, Correlation-based...

01 February 2016 3,920 2 View

Can we use Threshold Auto-regressive Regression (TAR) for continuous inputs and binary output?

Can we use Threshold Auto-regressive Regression (TAR) for continuous inputs and binary output? Is it appropriate for classification modeling? Output is is one if D(t) - D(t-1) is positive and...

01 February 2016 2,539 1 View

What is the minimum sample size for Factor Analysis?

What is minimum sample size required for using Factor Analysis? I have a data-set with 22 cases and 12 features. Is this sufficient? (I can't increase number of cases in my research, it is...

06 July 2015 1,443 2 View

One of the independet variables of logistic regression is fraction of other two inputs. Is this true?

Suppose that I have a logistic regression with continues independent variables a,b,c. In my logistic regression, c is equals to a/b. Is this true to have variable c in logistic regression? These...

05 June 2015 4,605 3 View

Logistic regression coefficients and exp(coefficients) meaning and relationship?

Suppose that we have this output for Logistic Regression: Coef EXP(Coef)X1 2.45 11.67X2 0.40 1.449X3 -4.1401 0.0159 Here X3 has highest...

05 June 2015 9,624 3 View

Which features are most relevant to each class in neural network (MLP) binary classification?

I designed a neural network for binary classification in MATLAB R2015a. Now I want these results: * What are differences between two classes? How system detects a sample is from class 1 or 2? For...

05 June 2015 6,397 5 View

How do I use principal component analysis (PCA) for mixed data?

I have a data-set about cities features like population, poverty, economic position, income etc. population, poverty and economic position are continues but poverty is based on a scoring. We have...

04 May 2015 6,545 8 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Are there any instruments for studying time similar to the way it is in space?

There are a huge number of methods for studying objects in space, according to the senses (and not only). Mechanical, thermal, optical, acoustic, electrical, magnetic, based on particle beams,...

06 August 2024 7,102 0 View

RNA later for the preservation of RNA in fecal samples at room temperature for one day (37°C)?

I am planning to collect human fecal samples for metatranscriptomic analysis using MGI. These samples are from indigenous people living in a region with high temperatures. I will have access to a...

06 August 2024 1,367 3 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

Why does the MFDFA algorithm need to calculate the profile of the time series?

As described in the Multifractal detrended fluctuation analysis (MFDFA) algorithm, it at first calculates the profile of the time series, and then other steps are operated on the profile....

05 August 2024 9,366 2 View

If we are using snowball sampling technique, how do we justify the true representativeness of the sample statistically? is there any statistical test?

Are there any statistical methods to justify your sampling technique using SPSS or AMOS?

05 August 2024 9,153 4 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Is there any machine to do real time pcr?

I want to know how do you make real time pcr solation ? is there any machine to make it? thanks for answering

05 August 2024 1,660 0 View

David Eugene Booth

I must admit I don't follow what you are doing. Nevertheless in validating logistic regressions i have found that bootstrap cross -validation was often preferable to K-fold cross- validation. I have attached 2 talks that I found on the internet. that give the basic idea. The links are also included. I hope this helps a little. Best wishes.

https://lagunita.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/cv_boot.pdf

http://stat.cmu.edu/~brian/724/week11/lec27-bootstrap.pdf

Eghbal Rahimikia

@Juan Thank you for answer. So your proposed solution for this problem is only using stratified cross-validation not sampling methods? I think using this method does not solve the main problem (unbalanced data set) because finally we are training model with all samples for predicting future. Is this correct?

@David . Thank you for answer David. Based on cited papers bootstrap cross -validation has some problems (bias and variance) and i think it is appropriate for high unbalanced data sets not my data set. What do you think?

Honestly, I don't know. Prof. Steyerberg, who is an expert, answered a sort of similar question. His answer there might be helpful. Link attached. Hope some of this helps a little. Good luck.

https://www.researchgate.net/post/How_to_interpret_the_results_of_5-fold_cross_validation