Suppose val_accuracy of model A is 99% but test accuracy is 98% and for model B val_accuracy is 98% but test accuracy is 99%. How to explain the better performance comparing both. And how to justify the fact?
said, if the difference is very small, it could be not statistically significant.
If the difference is important (e.g. 5-10%), then your Model B is generalizing better than Model A. In this case, Model B is more robust, as it performs well also on new unseen samples.