Well the important thing is that how well the training, testing and validation data sets describe the feature space...If number of points in the whole data set is large then any division may work fine but when the data set is limited, division ratio may play a crucial role...
Tranditionally we use 5-folders cross validation to verify our algorithm, so 80:20 would be find. But actually the real apportion of train and test dataset is related with the real situation and the quantity of the dataset.
Most of the time it depends on how much data you have. If you have a large data set, it's ok to consider the training:testing ratio 75:25. Because, although you get a very good accuracy, it does not mean that your trained ANN is a generalized one (If your ANN is not generalized, you will get bizarre outputs for some inputs which were not observed with the training set ). One reason for a non-generalized ANN is having a small training set, because for small training-sets the ANNs tend to over-fit. So, testing your model with many inputs can let you conclude whether your ANN is generalized or not. But, if you have a small data set, you better go with 90:10.
The issue is to get enough cases in the training set with suitable diversity to model the domain. I used decision trees quite a lot and I found, tucked away in an old Quinlan publication that it takes about 1,000 cases were sufficient for a "moderately complex" problem, so I used 60-fold cross-validation with a data set of 1200 cases. Of course that is a qualitative statement. I was discussing the problem with another researcher at conference who had used something like 17-fold cross validation for ANNs and he said something like this: "we trained our ann on 10-fold cross validation and observed the plot of the mse which did not reduce. We altered the split to increase the training data and ran it again. We continued training, increasing the proportion of training data (number of folds) until we saw the mse reducing over time in the way one would expect. We froze the split of training / testing data and used those proportions for our optimisation experiment." You should consider train / validation / test splits to avoid overfitting as mentioned above and use a similar process with mse as the criterion for selecting the splits.
The number of training and testing datasets depends on the first the total number of datasets, second the best calibration you need to do, and the calibration time which the latter is not usually so important in data driven approaches.
moreover, you should save some data for validation which is so much important.
You could conduct a sensitivity analysis on the number of training data to determine the model performance vs training dataset, error value, and CPU time taken and then find the best situation.
In my research project and paper (on my profile), I did some experiments and settled on a 80:20 for a prediction system for the stock market. This, from experiment, gave the best proportion, though the were other ratios that were equally close.
i found in refrences [Shahin, M. A., Maier, H. R., and Jaksa, M. B. (2004). "Data division for developing neural networks applied to geotechnical engineering." Journal of Computing in Civil Engineering,ASCE, 18(2), 105-114].
investigated the impact of the proportion of data used in various subsets on ANN model performance for a case study of settlement prediction of shallow foundations and found that there is no clear relationship between the proportion of data for training, testing and validation and model performance, however, they found that the best result was obtained when 20% of the data were used for validation and the remaining data were divided into 70% for training and 30% for testing.
The default in Matlab is 70:15:15 for training:validation:test. I would recommend 30 % for testing as you will get a better indication of the models performance on unseen data.
Well the important thing is that how well the training, testing and validation data sets describe the feature space...If number of points in the whole data set is large then any division may work fine but when the data set is limited, division ratio may play a crucial role...
All depends of data set dimension, what is clear. For small data set, it is good to use k-cross validation and 70-15-15 ratio. For big set it is fine to use circa 75-25 or more probs.
As mentioned by others before, the ratio 80:20 (Train : Test) would be the most commonly used also referred to as the Pareto principle (LINK: https://en.wikipedia.org/wiki/Pareto_principle). However, if you'd prefer to improve your results with the best parameters (neural networks, svm or any other classification method), I would always suggest the use of a k-fold cross validation. The standard values that you could use for k = {3, 5, 10}. However, these are not fixed and you could choose other values too.
Also, another out of the box suggestion. You could try to use a grid search to choose the optimum parameters to further improve the results.
Another point need to consider is Cross-validation, sometimes called rotation estimation, or out-of-sample testing. It helps to assess how the results of a statistical analysis will generalize to an independent data set.