1. For any neural network (or any other machine learning model):
If testing error is much lower than training error - it is a deceptive and unlikely coincidence. Randomly split dataset into training and testing and repeat training and testing.
If testing error is much higher than training error - model is suffering from high variance (overfitting).
If both training error and testing error is high - model is suffering from high bias (underfitting).
If testing error is approximately equal to training error and both are reasonably low - model is not suffering from high bias or high variance, which is a good sign.
2. While comparing two networks:
If both networks have approximately similar training errors and very different testing errors, pick network with lower testing error.
If both networks have approximately similar testing errors and very different training errors, pick network with lower training error.
3. In the context of convergence and learning rate:
Lower learning rate will get results slower. but does not risk 'jumping out of region of depression of minima'.
Higher learning rates will get results faster but it has risk of 'jumping out of region of depression of minima'.
Perhaps, introducing cross validation techniques will aid in explaining comparisons of neural networks solely based on training and testing results.
Using your original dataset, It can generate the numerous datasets by permutation testing technology. Then, each datasets are divided into training data, validation data, and testing data. Validation data is used to stop training model for avoiding overfitting. The testing data aims to collect the error measures (i.e., output layer and solution). The cross-validation can also be used to obtain the means of error measures. Thus, you can obtain the numerous error measures from first NN and obtain the numerous error measures from second NN (testing data). The same dataset is regarded as a test for regression. Maybe this can help you. Good luck!
The answer of Nahian Ahmed is explanatory, however, I'll add some details to cross-validation.
1. Train and test data split is usually randomly selected but it is not a good approach.
2. You can apply 5-fold, 10-fold or high cross-validation depending on the size of your dataset. If your dataset is small (i.e. less then 1000 instances) 10-fold or higher cross-validation is preferred.
3. In 10-fold cross validation, the dataset is divided into 10 random groups and nine of them are used for training and 10th is used for testing. In the next iteration another group of training is reserved for testing and the testing group is added back to training dataset.
The process will be repeated for 10-times and then average of all the cross-validations is used to represent the model performance. Two neural networks can be compared based on their cross-validation performance.
For object detection, I have found out mAP ( Mean Average Precision) method to be useful.
AP (Average precision) is a popular metric in measuring the accuracy of object detectors like Faster R-CNN, SSD, etc. Average precision computes the average precision value for recall value over 0 to 1. It sounds complicated but actually pretty simple as we illustrate it with an example. But before that, we will do a quick recap on precision, recall, and IoU first.
Precision & recall
Precision measures how accurate is your predictions. i.e. the percentage of your predictions are correct.
Recall measures how good you find all the positives. For example, we can find 80% of the possible positive cases in our top K predictions.