Checkout the two attachments. One is a screenshot of the nntraintool and the other is a screenshot of the performance plot of a well trained ANN. As you can see in the performance plot, with the epochs the MSE (Mean Square Error) of the ANN has decreased. A well trained ANN should have a very low MSE at the end of the training phase , which in this example, equals to 1.0276e-025. The meaning of MSE being very small (close to zero) is that the desired outputs and the ANN's outputs for the training set have become very close to each other.
Checkout the two attachments. One is a screenshot of the nntraintool and the other is a screenshot of the performance plot of a well trained ANN. As you can see in the performance plot, with the epochs the MSE (Mean Square Error) of the ANN has decreased. A well trained ANN should have a very low MSE at the end of the training phase , which in this example, equals to 1.0276e-025. The meaning of MSE being very small (close to zero) is that the desired outputs and the ANN's outputs for the training set have become very close to each other.
MSE is just as it says: The mean (average) magnitude of the squares of the error: i.e., the distance between the model's estimate of your test values and the actual test value. (squaring just converts things to an absolute value rather than fiddling with under or overshooting).
The physical interpretation would be that this is how close, on average, the hyperplane drawn by your network gets to the actual cloud of data in your validation set. Lila's MSE of 10^-25 shows that this network can essentially guess arbitrarily close to the targets.
Now this is only half of judging performance. The other half is how validation is done (is the model any good at guessing out-of-sample values?), and how much regularization the network has done (i.e., did the network use a ton of free parameters and overfit? or is it tuned to a small set of actual regularities in the data).
Yes I agree with Dr. Patrick. Even in the example which I have implemented for you, the ANN is over fitting (or in other words, it lacks of generalization). An over fitting ANN will give you better answers only for the training set (most of the time). Therefore, getting a very good performance does not mean that you will end up with a very good ANN model. You will definitely have to generalize your ANN.
As Dr.Patrick has mentioned you can use regularization and early stopping to improve the generalization of your ANN.