Your plot is typical case illustrating underfitting versus overfitting on the training data. In between, there is a moment when the performance is optimal (just before the MSE starts to increase on the validation). I don't see anything wrong with your plot (depending on the ANN architecture, optimal performance can be reached in less or more epochs). This blog post should explain the idea in more detail: