I have a data set contains 5400 healthy banks and 470 failed banks during 2009 to 2014.

First, what is the major criteria to divide the data to training set and validation set. Is there a specific percentage? or an equation?

Second, I was thinking to use the T test to determine the most important explanatory variables that differentiate between failed and healthy banks and then apply LRA, MDA, and the Dynamic Hazard model using all variables and the significant variables after T test.

Finally, after dividing data to training set, validation set, the whole variables and the significant variables after T test. I will have four different data sets.

Which results should I adopt?

More Ahmed Sayed Abd Elzaher's questions See All
Similar questions and discussions