Can someone recommend what is the best percent of divided the training data and testing data in neural network 75:25 or 80:20 or 90:10 ?

More Majid Al-gburi's questions See All

Scopus discontinued and accepted the journal list in May 2021?

Dear colleagues Please find the excel sheet attached, which is the Scopus discontinued and accepted the journal list in May 2021. My regards

04 June 2021 4,195 2 View

What is the maximum allowed concrete temperature during pouring on site?

Problems associated with heat of cement hydration · Strength reduction and cracking due to high temperature (70 C) · Strength reduction and cracking due to temperature differentials (20 C) is it...

23 February 2020 9,763 5 View

What is the maximum allowed concrete temperature during pouring on site?

Problems associated with heat of cement hydration · Strength reduction and cracking due to high temperature (70 C) · Strength reduction and cracking due to temperature differentials (20 C) is it...

01 February 2020 10,243 3 View

I need identification key to aquatic fungi ?

can any one provide me akey about aquatic fungi ?

02 March 2019 8,753 2 View

Small particles in my cell culture plates?

Sometimes I see these small particles surrounding my cells in cell culture plates. Are these particles microbial contamination? The first image is a 22RV1 cells used to be clear in the begining...

08 September 2018 1,215 20 View

Is it necessary to dry LiCl in a vacuum oven and dry TFE with CaCl in order to make LiCl/TFE buffer?

I need to make a LiCl/TFE buffer for diluting PPIase substrate (suc-AAPF-pNA) and I was reading the instruction that I have to dry LiCl in vacuum oven for 24 h and dry the TFE with a dry agent...

01 February 2018 6,235 1 View

How to extract Moment outputs in ABAQUS Finite Element Program?

Moment

10 November 2017 8,463 1 View

I need to modeled circular concrete foundation using ABAQUS, which is better and why brick or triangular elements? Are differences in the results?

modelling of turbine foundation

10 November 2017 6,324 4 View

What the methodology to extract melanin from fungi?

08 September 2017 8,613 0 View

How to extract melanin from fungi?

how to extract melanin from fungi?

08 September 2017 4,473 0 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Can we mark 'EFL Learners shifting from general digital to AI technologies' as technological transition?

After COVID-19 it has seen that EFL learners technological affiliation has raised. In addition, in the post-COVID period learners started to engage AI technologies like ChatGPT while learning...

08 August 2024 8,964 4 View

What are examples of AI for good projects a teacher can assign to students?

So I am organizing an AI seminar. What are possible AI projects in the AI for good spirit? something the students can do and have an impact?

08 August 2024 9,437 4 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

How to design human-centered classroom in the age of A.I.?

08 August 2024 347 5 View

Hello all, Looking for international reviewer to review Ph.D thesis in wireless sensor network.Can anybody help?

My name is Apurva Saoji. I am a Ph.D scholar in Computer engineering in India. I am looking for international expert in reviewing my PhD thesis, "Competitive Optimization Techniques to Minimize...

07 August 2024 4,600 2 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Weak DAPI staining after immunohistochemistry - how to improve?

After immunohistochemistry of previously fixed in PFA and EtOH and then frozen 20 μm sections of zebrafish brain, DAPI staining is very weak (right) compared to the same sections stained without...

05 August 2024 9,637 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

Muhammad Arif Popular answer

Well the important thing is that how well the training, testing and validation data sets describe the feature space...If number of points in the whole data set is large then any division may work fine but when the data set is limited, division ratio may play a crucial role...

Pranshu Saxena

Its only depend on the complexity of your situation (Application) how many independent parameter you have chosen. There is no fix criteria as such

Miao Yu

Tranditionally we use 5-folders cross validation to verify our algorithm, so 80:20 would be find. But actually the real apportion of train and test dataset is related with the real situation and the quantity of the dataset.

M.A.P. Chamikara

Hi Majid,

Most of the time it depends on how much data you have. If you have a large data set, it's ok to consider the training:testing ratio 75:25. Because, although you get a very good accuracy, it does not mean that your trained ANN is a generalized one (If your ANN is not generalized, you will get bizarre outputs for some inputs which were not observed with the training set ). One reason for a non-generalized ANN is having a small training set, because for small training-sets the ANNs tend to over-fit. So, testing your model with many inputs can let you conclude whether your ANN is generalized or not. But, if you have a small data set, you better go with 90:10.

Regards.

Majid Al-gburi

Thank dears

with the full respect to yours answered and opinions

i means supports the answer by standards committee or thesis or published paper

Christine Decaestecker

You can find an interesting discussion on this point with references to published papers here:

https://www.researchgate.net/post/Number_of_folds_for_cross-validation_method

James Dominic O'Shea

The issue is to get enough cases in the training set with suitable diversity to model the domain. I used decision trees quite a lot and I found, tucked away in an old Quinlan publication that it takes about 1,000 cases were sufficient for a "moderately complex" problem, so I used 60-fold cross-validation with a data set of 1200 cases. Of course that is a qualitative statement. I was discussing the problem with another researcher at conference who had used something like 17-fold cross validation for ANNs and he said something like this: "we trained our ann on 10-fold cross validation and observed the plot of the mse which did not reduce. We altered the split to increase the training data and ran it again. We continued training, increasing the proportion of training data (number of folds) until we saw the mse reducing over time in the way one would expect. We froze the split of training / testing data and used those proportions for our optimisation experiment." You should consider train / validation / test splits to avoid overfitting as mentioned above and use a similar process with mse as the criterion for selecting the splits.

Farhad Hooshyaripor

The number of training and testing datasets depends on the first the total number of datasets, second the best calibration you need to do, and the calibration time which the latter is not usually so important in data driven approaches.

moreover, you should save some data for validation which is so much important.

You could conduct a sensitivity analysis on the number of training data to determine the model performance vs training dataset, error value, and CPU time taken and then find the best situation.

Barack Wamkaya Wanjawa

In my research project and paper (on my profile), I did some experiments and settled on a 80:20 for a prediction system for the stock market. This, from experiment, gave the best proportion, though the were other ratios that were equally close.

thaks dear

i found in refrences [Shahin, M. A., Maier, H. R., and Jaksa, M. B. (2004). "Data division for developing neural networks applied to geotechnical engineering." Journal of Computing in Civil Engineering,ASCE, 18(2), 105-114].

investigated the impact of the proportion of data used in various subsets on ANN model performance for a case study of settlement prediction of shallow foundations and found that there is no clear relationship between the proportion of data for training, testing and validation and model performance, however, they found that the best result was obtained when 20% of the data were used for validation and the remaining data were divided into 70% for training and 30% for testing.

John Stamford

The default in Matlab is 70:15:15 for training:validation:test. I would recommend 30 % for testing as you will get a better indication of the models performance on unseen data.

Ouarda Assas

The best one is 50:25:25 training:validation : test . Good luck

Anwar Ali Yahya

A more reliable method is k-fold cross validation; e.g. 10-fold cross validation

Daniel Gomes Soares

Depends on the size of the database. Usually 70:15:15 - training, validation and test.

Muhammad Arif

Lejla Banjanovic-Mehmedovic

All depends of data set dimension, what is clear. For small data set, it is good to use k-cross validation and 70-15-15 ratio. For big set it is fine to use circa 75-25 or more probs.

Best Regards

Lejla

Mahesh Gorwar

It depends on size of the data but in solar radiation modelling many researchers have 70:15:15 ratio.

Hubert Anysz

Hagan Demuth Beale de Jesus "Neural network Design"

Ossowski "Sieci neuronowe w ujęciu algorytmicznym"

Tadeusiewicz "Biocybernetyka i inżynieria medyczna. T. 6. Sieci neuronowe"

They gives: 60:20:20 or 70:15:15 Train:validation:test data sets

Best regards

Hubert

Siwei Lou

I personally will use 5 to 10 fold cross-validation for TESTING.

Besides, every time when the data is trained, 30% data is used for validation (an early stop when 6 to 12 validations fail).

Shreedhar Savant Todkar

As mentioned by others before, the ratio 80:20 (Train : Test) would be the most commonly used also referred to as the Pareto principle (LINK: https://en.wikipedia.org/wiki/Pareto_principle). However, if you'd prefer to improve your results with the best parameters (neural networks, svm or any other classification method), I would always suggest the use of a k-fold cross validation. The standard values that you could use for k = {3, 5, 10}. However, these are not fixed and you could choose other values too.

Also, another out of the box suggestion. You could try to use a grid search to choose the optimum parameters to further improve the results.

Good luck!

Amin Ullah

Most of the researcher follows 60 for training and 20,20 for validation and testing.

do shuffle data before splitting.

Karim Hashim Kraidi Al-Saedi

I think it is better to using 80:20

Murtaza Khan

Another point need to consider is Cross-validation, sometimes called rotation estimation, or out-of-sample testing. It helps to assess how the results of a statistical analysis will generalize to an independent data set.

https://en.wikipedia.org/wiki/Cross-validation_(statistics)